Re: [iText-questions] Using Images extracted from a pdf

2010-02-25 Thread Leonard Rosenthol
There are eleven(!!) different ways to represent color in a PDF - your image 
extraction will need to support all of them in order to handle any possible 
image that you might present to it.

-Original Message-
From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
Sent: Thursday, February 25, 2010 2:44 PM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Using Images extracted from a pdf


I'm trying to do something and it seems I am getting,
JPEG and PNG for the colorSpace returns me a string .. or DeviceGray or /
DeviceRGB or / DeviceCMYK 
but now I'm testing this with BMP me returning an array
"/ColorSpace=[/Indexed, /DeviceRGB, 1, ] "

because this occurs, and how can I treat? which means each and which one
will be my ColorSpace true?


Leonard Rosenthol-3 wrote:
>
> Text extraction is difficult too - it just so happens that someone has
> written the code already and contributed it back, so that you can take
> advantage of it.  Before then, it was quite a nightmare.
>
> Now you have a requirement (for your job, I assume) and the code doesn't
> exist so you need to write it.  If you are unable to do so, due to lack of
> knowledge, time, etc. then perhaps your company should consider hiring
> iText Software to write it for you.  I am sure they would be glad to it -
> and in less time than you've already spent.
>
> The alternative - and there isn't any shortcut - is for you to learn about
> PDF and image formats...
>
> Leonard
>
> -Original Message-
> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
> Sent: Tuesday, February 23, 2010 4:07 PM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] Using Images extracted from a pdf
>
>
> i not want to transforme my pages in image..
>
> its so..
> i will extract de text of my pdf..
> its be easy..
>
> and in same action.. i will extract all images from this pdf..
> apply OCR in images.. for extract text of each image.
>
> so ...
> i need so much to get all images into the pdf..
> i won to take the byte raw of image..
> now need transform that in a valid JAVA.AWT.IMAGE OR BUFFEREDIMAGE
>
> Mike Marchywka-2 wrote:
>>
>>
>>
>>
>>
>> You can always use the command line tool in pdf toolkit or xpf,
>> I can't remember which but there is something like
>> pdf2image similar to pdf2text to extract text.
>>
>>
>>
>>
>>
>>
>>
>> 
>>> Date: Tue, 23 Feb 2010 12:43:28 -0800
>>> From: fernandogomes...@hotmail.com
>>> To: itext-questions@lists.sourceforge.net
>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>
>>>
>>> I'm going crazy with it. as you can see, I never manipulated images as
>>> low
>>> level. and do not have much sense of how things work. I am searching for
>>> a
>>> days for end my solution. and I'm already getting stressed.
>>> i going on test methods .. i try to do.. and before try by another
>>> choice..
>>> -.-
>>>
>>> can you give me some more assistance on how I can turn this array of
>>> bytes
>>> back into an image?
>>>
>>> could have just one class of api that made it not? : P
>>>
>>> Pdfimages buf = new pdfimages (myRawImageByteArray);
>>> buf.getAsBufferedImage ();
>>>
>>> : P
>>>
>>> if you say you can not help me all right, but I can indicate a content
>>> in
>>> which I can rely on to get this done?
>>>
>>> thanks.
>>>
>>>
>>> Leonard Rosenthol-3 wrote:
>>>>
>>>> The image is decompressed and then "injected" into the PDF. Same with
>>>> EVERY TYPE of image EXCEPT JPEG.
>>>>
>>>> -Original Message-
>>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
>>>> Sent: Tuesday, February 23, 2010 3:21 PM
>>>> To: itext-questions@lists.sourceforge.net
>>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>>
>>>>
>>>> ty ..
>>>>
>>>> I have a question.
>>>> when I insert an image that is not jpeg
>>>> what exactly happens with this?
>>>>
>>>> say that it is in PNG it is decompressed to be "injected" into PDF?
>>>>
>>>> or she keeps your PNG format, but the bytes are encoded with the
>>>> FlateEncode
>>>> ..
>>>>
>>>> a matter

Re: [iText-questions] Using Images extracted from a pdf

2010-02-25 Thread Fernando Gomes

I'm trying to do something and it seems I am getting,
JPEG and PNG for the colorSpace returns me a string .. or DeviceGray or /
DeviceRGB or / DeviceCMYK 
but now I'm testing this with BMP me returning an array
"/ColorSpace=[/Indexed, /DeviceRGB, 1, ] "

because this occurs, and how can I treat? which means each and which one
will be my ColorSpace true?


Leonard Rosenthol-3 wrote:
> 
> Text extraction is difficult too - it just so happens that someone has
> written the code already and contributed it back, so that you can take
> advantage of it.  Before then, it was quite a nightmare.
> 
> Now you have a requirement (for your job, I assume) and the code doesn't
> exist so you need to write it.  If you are unable to do so, due to lack of
> knowledge, time, etc. then perhaps your company should consider hiring
> iText Software to write it for you.  I am sure they would be glad to it -
> and in less time than you've already spent.
> 
> The alternative - and there isn't any shortcut - is for you to learn about
> PDF and image formats...
> 
> Leonard
> 
> -Original Message-
> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
> Sent: Tuesday, February 23, 2010 4:07 PM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] Using Images extracted from a pdf
> 
> 
> i not want to transforme my pages in image..
> 
> its so..
> i will extract de text of my pdf..
> its be easy..
> 
> and in same action.. i will extract all images from this pdf..
> apply OCR in images.. for extract text of each image.
> 
> so ...
> i need so much to get all images into the pdf..
> i won to take the byte raw of image..
> now need transform that in a valid JAVA.AWT.IMAGE OR BUFFEREDIMAGE
> 
> Mike Marchywka-2 wrote:
>>
>>
>>
>>
>>
>> You can always use the command line tool in pdf toolkit or xpf,
>> I can't remember which but there is something like
>> pdf2image similar to pdf2text to extract text.
>>
>>
>>
>>
>>
>>
>>
>> 
>>> Date: Tue, 23 Feb 2010 12:43:28 -0800
>>> From: fernandogomes...@hotmail.com
>>> To: itext-questions@lists.sourceforge.net
>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>
>>>
>>> I'm going crazy with it. as you can see, I never manipulated images as
>>> low
>>> level. and do not have much sense of how things work. I am searching for
>>> a
>>> days for end my solution. and I'm already getting stressed.
>>> i going on test methods .. i try to do.. and before try by another
>>> choice..
>>> -.-
>>>
>>> can you give me some more assistance on how I can turn this array of
>>> bytes
>>> back into an image?
>>>
>>> could have just one class of api that made it not? : P
>>>
>>> Pdfimages buf = new pdfimages (myRawImageByteArray);
>>> buf.getAsBufferedImage ();
>>>
>>> : P
>>>
>>> if you say you can not help me all right, but I can indicate a content
>>> in
>>> which I can rely on to get this done?
>>>
>>> thanks.
>>>
>>>
>>> Leonard Rosenthol-3 wrote:
>>>>
>>>> The image is decompressed and then "injected" into the PDF. Same with
>>>> EVERY TYPE of image EXCEPT JPEG.
>>>>
>>>> -Original Message-
>>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
>>>> Sent: Tuesday, February 23, 2010 3:21 PM
>>>> To: itext-questions@lists.sourceforge.net
>>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>>
>>>>
>>>> ty ..
>>>>
>>>> I have a question.
>>>> when I insert an image that is not jpeg
>>>> what exactly happens with this?
>>>>
>>>> say that it is in PNG it is decompressed to be "injected" into PDF?
>>>>
>>>> or she keeps your PNG format, but the bytes are encoded with the
>>>> FlateEncode
>>>> ..
>>>>
>>>> a matter of finding the filter and decode do I get it.
>>>>
>>>> and if the image is uncompressed before being inserted to PDF, how do I
>>>> know
>>>> which type of encode the image?
>>>>
>>>>
>>>> Leonard Rosenthol-3 wrote:
>>>>>
>>>>> Bits per pixel is the BitsPerComponent value in the image object

Re: [iText-questions] Using Images extracted from a pdf

2010-02-24 Thread Fernando Gomes


I need this solution, not yet found any alternatives to the pdfbox or PDF
renderer.

with these two I still can not get all the images within a PDF
if you can help, thank you very much!

Mike Marchywka-2 wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> Date: Tue, 23 Feb 2010 06:52:54 -0800
>> From: fernandogomes...@hotmail.com
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>
>>
>> can anyone help-me one more time..
>> i dont know what i do ..
>>
>> I need to get the image bytes, now decoded...
> 
> probably the open source pdf renderer would answer your questions and
> provide
> more context. I seem to recall it was pretty easy to modify to extract
> page images
> in your favorite format, probably in process of rendering the included
> images
> are extracted etc.
> 
> 
> 
> 
>>
>> String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString();
>>> String filter = pdfStrem.get(PdfName.FILTER).toString();
>>> int bits =
>>> Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString());
>>> int width =
>>> Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString());
>>> int height =
>>> Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString());
>>> PdfDictionary param =
>>> (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS);
>>> int colors =
>>> Integer.valueOf(param.get(PdfName.COLORS).toString());
>>> int predictor =
>>> Integer.valueOf(param.get(PdfName.PREDICTOR).toString());
>>> int colums =
>>> Integer.valueOf(param.get(PdfName.COLUMNS).toString());
>>> if(filter.equals("/FlateDecode"))
>>> {
>>> byte[] bytesDecod = PdfReader.FlateDecode(bytes);
>>
>> these are all the information that I can withdraw PDF
>>
>> I have to do to create my image in general ..
>> I'm trying to do, or learn, but this hard, all my attempts have failed.
>> ty
>>
>>
>> Fernando Gomes wrote:
>>>
>>> Sirs, really sorry for duplicating, can delete other topics ?
>>> so sorry ..:blush:
>>>
>>> very thkx for help..
>>> and so good fast help ..
>>> i will estudy more ..
>>>
>>>
>>> Leonard Rosenthol-3 wrote:
>>>>
>>>> You are assuming that PDF maintains the PNG nature of the image - that
>>>> is
>>>> NOT the case. PDF only supports two kinds of images JPEG (which is why
>>>> this works) and "raw bitmaps" (aka an array of bits). So in your case,
>>>> with the PNG, it is transcoded into the latter case and so if you want
>>>> it
>>>> back you will need to reverse the process on your end.
>>>>
>>>
>>>
>>> for this response in other same email :blush:
>>> quote of "1T3XT info" below ..
>>>
>>> really thanks. I must have seen the realance the chapter that you
>>> mentioned, I will read again and very carefully. My English is very
>>> weak,
>>> and it is very difficult to read.
>>>
>>> you are very funny, I laughed a lot. I know I deserved the scolding.
>>> Really thanks for your help. I will test and then come back to post the
>>> result.
>>> Thank you!
>>>
>>>
>>> 1T3XT info wrote:
>>>>
>>>> Fernando Henrique Gomes wrote:
>>>>> the problem is when I insert an image in PNG format and then try to
>>>>> get
>>>>> the same...
>>>>
>>>> OK, we're talking about a PNG.
>>>> If you've read chapter 10 of the 2nd edition of "iText in Action",
>>>> you know that PNGs are transformed into zipped pixels.
>>>> If you didn't know, you should read the book!
>>>>
>>>>> on here i try to take that image...
>>>>>
>>>>> [code]
>>>>> int XrefIndex =((PRIndirectReference)obj).getNumber();
>>>>> PdfObject pdfObj = pdf.getPdfObject(XrefIndex);
>>>>> PdfStream pdfStrem = (PdfStream)pdfObj;
>>>>> byte[] bytes =
>>>>> PdfReader.getStreamBytesRaw((PRStream)pdfStrem);
>>>>> if ((bytes != null)) {
>>>>> String fileName = "Image_P"+pageNumber+"_";
>>>>> File file = new File(fileName);
>>>>> FileOutputStream fw = new FileOutputStream(file);
>>>>> fw.write(byt

Re: [iText-questions] Using Images extracted from a pdf

2010-02-24 Thread Fernando Gomes

Leonard, send me a personal email with the proposal of iText ($) to solve
this problem.

I think you already understand, but in short ...
get all the images from a PDF and turn into BufferedImage.
ty.


Leonard Rosenthol-3 wrote:
> 
> Text extraction is difficult too - it just so happens that someone has
> written the code already and contributed it back, so that you can take
> advantage of it.  Before then, it was quite a nightmare.
> 
> Now you have a requirement (for your job, I assume) and the code doesn't
> exist so you need to write it.  If you are unable to do so, due to lack of
> knowledge, time, etc. then perhaps your company should consider hiring
> iText Software to write it for you.  I am sure they would be glad to it -
> and in less time than you've already spent.
> 
> The alternative - and there isn't any shortcut - is for you to learn about
> PDF and image formats...
> 
> Leonard
> 
> -Original Message-
> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
> Sent: Tuesday, February 23, 2010 4:07 PM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] Using Images extracted from a pdf
> 
> 
> i not want to transforme my pages in image..
> 
> its so..
> i will extract de text of my pdf..
> its be easy..
> 
> and in same action.. i will extract all images from this pdf..
> apply OCR in images.. for extract text of each image.
> 
> so ...
> i need so much to get all images into the pdf..
> i won to take the byte raw of image..
> now need transform that in a valid JAVA.AWT.IMAGE OR BUFFEREDIMAGE
> 
> Mike Marchywka-2 wrote:
>>
>>
>>
>>
>>
>> You can always use the command line tool in pdf toolkit or xpf,
>> I can't remember which but there is something like
>> pdf2image similar to pdf2text to extract text.
>>
>>
>>
>>
>>
>>
>>
>> --------
>>> Date: Tue, 23 Feb 2010 12:43:28 -0800
>>> From: fernandogomes...@hotmail.com
>>> To: itext-questions@lists.sourceforge.net
>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>
>>>
>>> I'm going crazy with it. as you can see, I never manipulated images as
>>> low
>>> level. and do not have much sense of how things work. I am searching for
>>> a
>>> days for end my solution. and I'm already getting stressed.
>>> i going on test methods .. i try to do.. and before try by another
>>> choice..
>>> -.-
>>>
>>> can you give me some more assistance on how I can turn this array of
>>> bytes
>>> back into an image?
>>>
>>> could have just one class of api that made it not? : P
>>>
>>> Pdfimages buf = new pdfimages (myRawImageByteArray);
>>> buf.getAsBufferedImage ();
>>>
>>> : P
>>>
>>> if you say you can not help me all right, but I can indicate a content
>>> in
>>> which I can rely on to get this done?
>>>
>>> thanks.
>>>
>>>
>>> Leonard Rosenthol-3 wrote:
>>>>
>>>> The image is decompressed and then "injected" into the PDF. Same with
>>>> EVERY TYPE of image EXCEPT JPEG.
>>>>
>>>> -Original Message-
>>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
>>>> Sent: Tuesday, February 23, 2010 3:21 PM
>>>> To: itext-questions@lists.sourceforge.net
>>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>>
>>>>
>>>> ty ..
>>>>
>>>> I have a question.
>>>> when I insert an image that is not jpeg
>>>> what exactly happens with this?
>>>>
>>>> say that it is in PNG it is decompressed to be "injected" into PDF?
>>>>
>>>> or she keeps your PNG format, but the bytes are encoded with the
>>>> FlateEncode
>>>> ..
>>>>
>>>> a matter of finding the filter and decode do I get it.
>>>>
>>>> and if the image is uncompressed before being inserted to PDF, how do I
>>>> know
>>>> which type of encode the image?
>>>>
>>>>
>>>> Leonard Rosenthol-3 wrote:
>>>>>
>>>>> Bits per pixel is the BitsPerComponent value in the image object
>>>>>
>>>>> Pixels per line (POR LINHA) is _NOT_ Width * bits. It's Width *
>>>>> NumComponents, where NumComponents is based on the col

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Leonard Rosenthol
Text extraction is difficult too - it just so happens that someone has written 
the code already and contributed it back, so that you can take advantage of it. 
 Before then, it was quite a nightmare.

Now you have a requirement (for your job, I assume) and the code doesn't exist 
so you need to write it.  If you are unable to do so, due to lack of knowledge, 
time, etc. then perhaps your company should consider hiring iText Software to 
write it for you.  I am sure they would be glad to it - and in less time than 
you've already spent.

The alternative - and there isn't any shortcut - is for you to learn about PDF 
and image formats...

Leonard

-Original Message-
From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
Sent: Tuesday, February 23, 2010 4:07 PM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Using Images extracted from a pdf


i not want to transforme my pages in image..

its so..
i will extract de text of my pdf..
its be easy..

and in same action.. i will extract all images from this pdf..
apply OCR in images.. for extract text of each image.

so ...
i need so much to get all images into the pdf..
i won to take the byte raw of image..
now need transform that in a valid JAVA.AWT.IMAGE OR BUFFEREDIMAGE

Mike Marchywka-2 wrote:
>
>
>
>
>
> You can always use the command line tool in pdf toolkit or xpf,
> I can't remember which but there is something like
> pdf2image similar to pdf2text to extract text.
>
>
>
>
>
>
>
> 
>> Date: Tue, 23 Feb 2010 12:43:28 -0800
>> From: fernandogomes...@hotmail.com
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>
>>
>> I'm going crazy with it. as you can see, I never manipulated images as
>> low
>> level. and do not have much sense of how things work. I am searching for
>> a
>> days for end my solution. and I'm already getting stressed.
>> i going on test methods .. i try to do.. and before try by another
>> choice..
>> -.-
>>
>> can you give me some more assistance on how I can turn this array of
>> bytes
>> back into an image?
>>
>> could have just one class of api that made it not? : P
>>
>> Pdfimages buf = new pdfimages (myRawImageByteArray);
>> buf.getAsBufferedImage ();
>>
>> : P
>>
>> if you say you can not help me all right, but I can indicate a content in
>> which I can rely on to get this done?
>>
>> thanks.
>>
>>
>> Leonard Rosenthol-3 wrote:
>>>
>>> The image is decompressed and then "injected" into the PDF. Same with
>>> EVERY TYPE of image EXCEPT JPEG.
>>>
>>> -Original Message-
>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
>>> Sent: Tuesday, February 23, 2010 3:21 PM
>>> To: itext-questions@lists.sourceforge.net
>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>
>>>
>>> ty ..
>>>
>>> I have a question.
>>> when I insert an image that is not jpeg
>>> what exactly happens with this?
>>>
>>> say that it is in PNG it is decompressed to be "injected" into PDF?
>>>
>>> or she keeps your PNG format, but the bytes are encoded with the
>>> FlateEncode
>>> ..
>>>
>>> a matter of finding the filter and decode do I get it.
>>>
>>> and if the image is uncompressed before being inserted to PDF, how do I
>>> know
>>> which type of encode the image?
>>>
>>>
>>> Leonard Rosenthol-3 wrote:
>>>>
>>>> Bits per pixel is the BitsPerComponent value in the image object
>>>>
>>>> Pixels per line (POR LINHA) is _NOT_ Width * bits. It's Width *
>>>> NumComponents, where NumComponents is based on the colorspace in
>>>> question
>>>> (eg. RGB == 3, CMYK == 4).
>>>>
>>>> -Original Message-
>>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
>>>> Sent: Tuesday, February 23, 2010 2:00 PM
>>>> To: itext-questions@lists.sourceforge.net
>>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>>
>>>>
>>>>
>>>>
>>>>> public static BufferedImage createBufferedImageFromRawBytes(byte[]
>>>>> bytes,int width, int height, int bits) throws BadElementException,
>>>>> MalformedURLException, IOException {
>>>>> com.lowagie.text.Image img =

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Fernando Gomes

i not want to transforme my pages in image..

its so..
i will extract de text of my pdf..
its be easy..

and in same action.. i will extract all images from this pdf..
apply OCR in images.. for extract text of each image.

so ...
i need so much to get all images into the pdf..
i won to take the byte raw of image..
now need transform that in a valid JAVA.AWT.IMAGE OR BUFFEREDIMAGE

Mike Marchywka-2 wrote:
> 
> 
> 
> 
> 
> You can always use the command line tool in pdf toolkit or xpf, 
> I can't remember which but there is something like
> pdf2image similar to pdf2text to extract text.
> 
> 
> 
> 
> 
> 
> 
> 
>> Date: Tue, 23 Feb 2010 12:43:28 -0800
>> From: fernandogomes...@hotmail.com
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>
>>
>> I'm going crazy with it. as you can see, I never manipulated images as
>> low
>> level. and do not have much sense of how things work. I am searching for
>> a
>> days for end my solution. and I'm already getting stressed.
>> i going on test methods .. i try to do.. and before try by another
>> choice..
>> -.-
>>
>> can you give me some more assistance on how I can turn this array of
>> bytes
>> back into an image?
>>
>> could have just one class of api that made it not? : P
>>
>> Pdfimages buf = new pdfimages (myRawImageByteArray);
>> buf.getAsBufferedImage ();
>>
>> : P
>>
>> if you say you can not help me all right, but I can indicate a content in
>> which I can rely on to get this done?
>>
>> thanks.
>>
>>
>> Leonard Rosenthol-3 wrote:
>>>
>>> The image is decompressed and then "injected" into the PDF. Same with
>>> EVERY TYPE of image EXCEPT JPEG.
>>>
>>> -Original Message-
>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
>>> Sent: Tuesday, February 23, 2010 3:21 PM
>>> To: itext-questions@lists.sourceforge.net
>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>
>>>
>>> ty ..
>>>
>>> I have a question.
>>> when I insert an image that is not jpeg
>>> what exactly happens with this?
>>>
>>> say that it is in PNG it is decompressed to be "injected" into PDF?
>>>
>>> or she keeps your PNG format, but the bytes are encoded with the
>>> FlateEncode
>>> ..
>>>
>>> a matter of finding the filter and decode do I get it.
>>>
>>> and if the image is uncompressed before being inserted to PDF, how do I
>>> know
>>> which type of encode the image?
>>>
>>>
>>> Leonard Rosenthol-3 wrote:
>>>>
>>>> Bits per pixel is the BitsPerComponent value in the image object
>>>>
>>>> Pixels per line (POR LINHA) is _NOT_ Width * bits. It's Width *
>>>> NumComponents, where NumComponents is based on the colorspace in
>>>> question
>>>> (eg. RGB == 3, CMYK == 4).
>>>>
>>>> -Original Message-
>>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
>>>> Sent: Tuesday, February 23, 2010 2:00 PM
>>>> To: itext-questions@lists.sourceforge.net
>>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>>
>>>>
>>>>
>>>>
>>>>> public static BufferedImage createBufferedImageFromRawBytes(byte[]
>>>>> bytes,int width, int height, int bits) throws BadElementException,
>>>>> MalformedURLException, IOException {
>>>>> com.lowagie.text.Image img =
>>>>> com.lowagie.text.Image.getInstance(bytes);
>>>>>
>>>>> DataBuffer db = new DataBufferByte (img.getRawData(),
>>>>> img.getRawData().length);
>>>>>
>>>>> WritableRaster raster = Raster.createPackedRaster(db, //DATA BUFFER
>>>>> width, //LARGURA
>>>>> height, //ALTURA
>>>>> width*bits, //LARGURA * BITS POR PIXEL = PIXEL POR
>>>>> LINHA
>>>>> ->scanlineStride
>>>>> // bits, //BITS POR PIXEL ->pixelStride
>>>>> new int [] {bits},
>>>>>
>>>>> null);
>>>>>
>>>>> ColorSpace cs = ColorSpace.getInstance (img.getColorspace());
>>>>> ColorModel cm = new ComponentColorModel(cs, false, false,
>>>&g

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Mike Marchywka




You can always use the command line tool in pdf toolkit or xpf, 
I can't remember which but there is something like
pdf2image similar to pdf2text to extract text.








> Date: Tue, 23 Feb 2010 12:43:28 -0800
> From: fernandogomes...@hotmail.com
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] Using Images extracted from a pdf
>
>
> I'm going crazy with it. as you can see, I never manipulated images as low
> level. and do not have much sense of how things work. I am searching for a
> days for end my solution. and I'm already getting stressed.
> i going on test methods .. i try to do.. and before try by another choice..
> -.-
>
> can you give me some more assistance on how I can turn this array of bytes
> back into an image?
>
> could have just one class of api that made it not? : P
>
> Pdfimages buf = new pdfimages (myRawImageByteArray);
> buf.getAsBufferedImage ();
>
> : P
>
> if you say you can not help me all right, but I can indicate a content in
> which I can rely on to get this done?
>
> thanks.
>
>
> Leonard Rosenthol-3 wrote:
>>
>> The image is decompressed and then "injected" into the PDF. Same with
>> EVERY TYPE of image EXCEPT JPEG.
>>
>> -Original Message-
>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
>> Sent: Tuesday, February 23, 2010 3:21 PM
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>
>>
>> ty ..
>>
>> I have a question.
>> when I insert an image that is not jpeg
>> what exactly happens with this?
>>
>> say that it is in PNG it is decompressed to be "injected" into PDF?
>>
>> or she keeps your PNG format, but the bytes are encoded with the
>> FlateEncode
>> ..
>>
>> a matter of finding the filter and decode do I get it.
>>
>> and if the image is uncompressed before being inserted to PDF, how do I
>> know
>> which type of encode the image?
>>
>>
>> Leonard Rosenthol-3 wrote:
>>>
>>> Bits per pixel is the BitsPerComponent value in the image object
>>>
>>> Pixels per line (POR LINHA) is _NOT_ Width * bits. It's Width *
>>> NumComponents, where NumComponents is based on the colorspace in question
>>> (eg. RGB == 3, CMYK == 4).
>>>
>>> -Original Message-
>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com]
>>> Sent: Tuesday, February 23, 2010 2:00 PM
>>> To: itext-questions@lists.sourceforge.net
>>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>>
>>>
>>>
>>>
>>>> public static BufferedImage createBufferedImageFromRawBytes(byte[]
>>>> bytes,int width, int height, int bits) throws BadElementException,
>>>> MalformedURLException, IOException {
>>>> com.lowagie.text.Image img =
>>>> com.lowagie.text.Image.getInstance(bytes);
>>>>
>>>> DataBuffer db = new DataBufferByte (img.getRawData(),
>>>> img.getRawData().length);
>>>>
>>>> WritableRaster raster = Raster.createPackedRaster(db, //DATA BUFFER
>>>> width, //LARGURA
>>>> height, //ALTURA
>>>> width*bits, //LARGURA * BITS POR PIXEL = PIXEL POR
>>>> LINHA
>>>> ->scanlineStride
>>>> // bits, //BITS POR PIXEL ->pixelStride
>>>> new int [] {bits},
>>>>
>>>> null);
>>>>
>>>> ColorSpace cs = ColorSpace.getInstance (img.getColorspace());
>>>> ColorModel cm = new ComponentColorModel(cs, false, false,
>>>> Transparency.OPAQUE, db.getDataType());
>>>> BufferedImage bi = new BufferedImage (cm, raster, false, null);
>>>> return null;
>>>> }
>>>>
>>>>
>>>
>>> this code is up to where I could get, but there are variables that I know
>>> of
>>> to generate bufferedImage, please someone help me see if I'm on track.
>>> If I write something wrong.
>>>
>>>
>>>
>>> Fernando Gomes wrote:
>>>>
>>>> can anyone help-me one more time..
>>>> i dont know what i do ..
>>>>
>>>> I need to get the image bytes, now decoded...
>>>>
>>>> String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString();
>>>>> String filter = pdfStrem.get(PdfName.FILTER).toString();
>>>>> int bits =
>>>&

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Fernando Gomes

I'm going crazy with it. as you can see, I never manipulated images as low
level. and do not have much sense of how things work. I am searching for a
days for end my solution. and I'm already getting stressed.
i going on test methods .. i try to do.. and before try by another choice..
-.-

can you give me some more assistance on how I can turn this array of bytes
back into an image?

 could have just one class of api that made it not? : P

Pdfimages buf = new pdfimages (myRawImageByteArray);
buf.getAsBufferedImage ();

: P

if you say you can not help me all right, but I can indicate a content in
which I can rely on to get this done?

thanks.


Leonard Rosenthol-3 wrote:
> 
> The image is decompressed and then "injected" into the PDF.  Same with
> EVERY TYPE of image EXCEPT JPEG.
> 
> -Original Message-
> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] 
> Sent: Tuesday, February 23, 2010 3:21 PM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] Using Images extracted from a pdf
> 
> 
> ty ..
> 
> I have a question.
> when I insert an image that is not jpeg
> what exactly happens with this?
> 
> say that it is in PNG it is decompressed to be "injected" into PDF?
> 
> or she keeps your PNG format, but the bytes are encoded with the
> FlateEncode
> ..
> 
> a matter of finding the filter and decode do I get it.
> 
> and if the image is uncompressed before being inserted to PDF, how do I
> know
> which type of encode the image?
> 
> 
> Leonard Rosenthol-3 wrote:
>> 
>> Bits per pixel is the BitsPerComponent value in the image object
>> 
>> Pixels per line (POR LINHA) is _NOT_ Width * bits.  It's Width *
>> NumComponents, where NumComponents is based on the colorspace in question
>> (eg. RGB == 3, CMYK == 4).
>> 
>> -Original Message-
>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] 
>> Sent: Tuesday, February 23, 2010 2:00 PM
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>> 
>> 
>> 
>> 
>>>  public static BufferedImage createBufferedImageFromRawBytes(byte[]
>>> bytes,int width, int height, int bits) throws BadElementException,
>>> MalformedURLException, IOException {
>>>   com.lowagie.text.Image img =
>>> com.lowagie.text.Image.getInstance(bytes);
>>>   
>>> DataBuffer db = new DataBufferByte (img.getRawData(),
>>> img.getRawData().length);
>>> 
>>> WritableRaster raster = Raster.createPackedRaster(db, 
>>> //DATA BUFFER 
>>> 
>>> width, //LARGURA
>>> 
>>> height, //ALTURA
>>> 
>>> width*bits, 
>>> //LARGURA * BITS POR PIXEL = PIXEL POR
>>> LINHA
>>> ->scanlineStride 
>>> //  
>>> bits, //BITS POR 
>>> PIXEL  ->pixelStride
>>> 
>>> new int [] {bits}, 
>>> 
>>> 
>>> 
>>> null);
>>> 
>>> ColorSpace cs = ColorSpace.getInstance 
>>> (img.getColorspace());
>>> ColorModel  cm = new ComponentColorModel(cs, false, 
>>> false,
>>> Transparency.OPAQUE, db.getDataType());
>>> BufferedImage bi = new BufferedImage (cm, raster, 
>>> false, null); 
>>>   return null;
>>>   }
>>> 
>>> 
>> 
>> this code is up to where I could get, but there are variables that I know
>> of
>> to generate bufferedImage, please someone help me see if I'm on track.
>> If I write something wrong.
>> 
>> 
>> 
>> Fernando G

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Leonard Rosenthol
The image is decompressed and then "injected" into the PDF.  Same with EVERY 
TYPE of image EXCEPT JPEG.

-Original Message-
From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] 
Sent: Tuesday, February 23, 2010 3:21 PM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Using Images extracted from a pdf


ty ..

I have a question.
when I insert an image that is not jpeg
what exactly happens with this?

say that it is in PNG it is decompressed to be "injected" into PDF?

or she keeps your PNG format, but the bytes are encoded with the FlateEncode
..

a matter of finding the filter and decode do I get it.

and if the image is uncompressed before being inserted to PDF, how do I know
which type of encode the image?


Leonard Rosenthol-3 wrote:
> 
> Bits per pixel is the BitsPerComponent value in the image object
> 
> Pixels per line (POR LINHA) is _NOT_ Width * bits.  It's Width *
> NumComponents, where NumComponents is based on the colorspace in question
> (eg. RGB == 3, CMYK == 4).
> 
> -Original Message-
> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] 
> Sent: Tuesday, February 23, 2010 2:00 PM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] Using Images extracted from a pdf
> 
> 
> 
> 
>>  public static BufferedImage createBufferedImageFromRawBytes(byte[]
>> bytes,int width, int height, int bits) throws BadElementException,
>> MalformedURLException, IOException {
>>com.lowagie.text.Image img =
>> com.lowagie.text.Image.getInstance(bytes);
>>
>>  DataBuffer db = new DataBufferByte (img.getRawData(),
>> img.getRawData().length);
>>  
>>  WritableRaster raster = Raster.createPackedRaster(db, 
>> //DATA BUFFER 
>>  
>> width, //LARGURA
>>  
>> height, //ALTURA
>>  
>> width*bits, 
>> //LARGURA * BITS POR PIXEL = PIXEL POR LINHA
>> ->scanlineStride 
>> //   
>> bits, //BITS POR 
>> PIXEL  ->pixelStride
>>  
>> new int [] {bits}, 
>>  
>> 
>>  
>> null);
>>  
>>  ColorSpace cs = ColorSpace.getInstance 
>> (img.getColorspace());
>>  ColorModel  cm = new ComponentColorModel(cs, false, 
>> false,
>> Transparency.OPAQUE, db.getDataType());
>>  BufferedImage bi = new BufferedImage (cm, raster, 
>> false, null); 
>>return null;
>>}
>> 
>> 
> 
> this code is up to where I could get, but there are variables that I know
> of
> to generate bufferedImage, please someone help me see if I'm on track.
> If I write something wrong.
> 
> 
> 
> Fernando Gomes wrote:
>> 
>> can anyone help-me one more time..
>> i dont know what i do ..
>> 
>> I need to get the image bytes, now decoded...
>> 
>> String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString();
>>> String 
>>> filter = pdfStrem.get(PdfName.FILTER).toString();
>>> int 
>>> bits =
>>> Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString());
>>> int 
>>> width =
>>> Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString());
>>> int 
>>> height =
>>> Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString());
>>> 
>>> PdfDictionary p

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Fernando Gomes

ty ..

I have a question.
when I insert an image that is not jpeg
what exactly happens with this?

say that it is in PNG it is decompressed to be "injected" into PDF?

or she keeps your PNG format, but the bytes are encoded with the FlateEncode
..

a matter of finding the filter and decode do I get it.

and if the image is uncompressed before being inserted to PDF, how do I know
which type of encode the image?


Leonard Rosenthol-3 wrote:
> 
> Bits per pixel is the BitsPerComponent value in the image object
> 
> Pixels per line (POR LINHA) is _NOT_ Width * bits.  It's Width *
> NumComponents, where NumComponents is based on the colorspace in question
> (eg. RGB == 3, CMYK == 4).
> 
> -Original Message-
> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] 
> Sent: Tuesday, February 23, 2010 2:00 PM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] Using Images extracted from a pdf
> 
> 
> 
> 
>>  public static BufferedImage createBufferedImageFromRawBytes(byte[]
>> bytes,int width, int height, int bits) throws BadElementException,
>> MalformedURLException, IOException {
>>com.lowagie.text.Image img =
>> com.lowagie.text.Image.getInstance(bytes);
>>
>>  DataBuffer db = new DataBufferByte (img.getRawData(),
>> img.getRawData().length);
>>  
>>  WritableRaster raster = Raster.createPackedRaster(db, 
>> //DATA BUFFER 
>>  
>> width, //LARGURA
>>  
>> height, //ALTURA
>>  
>> width*bits, 
>> //LARGURA * BITS POR PIXEL = PIXEL POR LINHA
>> ->scanlineStride 
>> //   
>> bits, //BITS POR 
>> PIXEL  ->pixelStride
>>  
>> new int [] {bits}, 
>>  
>> 
>>  
>> null);
>>  
>>  ColorSpace cs = ColorSpace.getInstance 
>> (img.getColorspace());
>>  ColorModel  cm = new ComponentColorModel(cs, false, 
>> false,
>> Transparency.OPAQUE, db.getDataType());
>>  BufferedImage bi = new BufferedImage (cm, raster, 
>> false, null); 
>>return null;
>>}
>> 
>> 
> 
> this code is up to where I could get, but there are variables that I know
> of
> to generate bufferedImage, please someone help me see if I'm on track.
> If I write something wrong.
> 
> 
> 
> Fernando Gomes wrote:
>> 
>> can anyone help-me one more time..
>> i dont know what i do ..
>> 
>> I need to get the image bytes, now decoded...
>> 
>> String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString();
>>> String 
>>> filter = pdfStrem.get(PdfName.FILTER).toString();
>>> int 
>>> bits =
>>> Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString());
>>> int 
>>> width =
>>> Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString());
>>> int 
>>> height =
>>> Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString());
>>> 
>>> PdfDictionary param =
>>> (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS);
>>> int 
>>> colors =
>>> Integer.valueOf(param.get(PdfName.COLORS).toString());
>>> int 
>>> predictor =
>>> Integer.valueOf(param.get

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Leonard Rosenthol
Bits per pixel is the BitsPerComponent value in the image object

Pixels per line (POR LINHA) is _NOT_ Width * bits.  It's Width * NumComponents, 
where NumComponents is based on the colorspace in question (eg. RGB == 3, CMYK 
== 4).

-Original Message-
From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] 
Sent: Tuesday, February 23, 2010 2:00 PM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Using Images extracted from a pdf




>  public static BufferedImage createBufferedImageFromRawBytes(byte[]
> bytes,int width, int height, int bits) throws BadElementException,
> MalformedURLException, IOException {
> com.lowagie.text.Image img =
> com.lowagie.text.Image.getInstance(bytes);
> 
>   DataBuffer db = new DataBufferByte (img.getRawData(),
> img.getRawData().length);
>   
>   WritableRaster raster = Raster.createPackedRaster(db, 
> //DATA BUFFER 
>   
> width, //LARGURA
>   
> height, //ALTURA
>   
> width*bits, //LARGURA 
> * BITS POR PIXEL = PIXEL POR LINHA
> ->scanlineStride 
> //
> bits, //BITS POR 
> PIXEL  ->pixelStride
>   
> new int [] {bits}, 
>   
> 
>   
> null);
>   
>   ColorSpace cs = ColorSpace.getInstance 
> (img.getColorspace());
>   ColorModel  cm = new ComponentColorModel(cs, false, 
> false,
> Transparency.OPAQUE, db.getDataType());
>   BufferedImage bi = new BufferedImage (cm, raster, 
> false, null); 
> return null;
> }
> 
> 

this code is up to where I could get, but there are variables that I know of
to generate bufferedImage, please someone help me see if I'm on track.
If I write something wrong.



Fernando Gomes wrote:
> 
> can anyone help-me one more time..
> i dont know what i do ..
> 
> I need to get the image bytes, now decoded...
> 
> String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString();
>>  String 
>> filter = pdfStrem.get(PdfName.FILTER).toString();
>>  int 
>> bits =
>> Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString());
>>  int 
>> width =
>> Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString());
>>  int 
>> height =
>> Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString());
>>  
>> PdfDictionary param =
>> (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS);
>>  int 
>> colors =
>> Integer.valueOf(param.get(PdfName.COLORS).toString());
>>  int 
>> predictor =
>> Integer.valueOf(param.get(PdfName.PREDICTOR).toString());
>>  int 
>> colums =
>> Integer.valueOf(param.get(PdfName.COLUMNS).toString());
>>  
>> if(filter.equals("/FlateDecode"))
>>  {
>>  
>> byte[] bytesDecod = PdfReader.FlateDecode(bytes);
> 
> these are all the information that I can withdraw PDF
> 
> I have to do to create my image in general ..
> I'm trying to do, or learn, but this hard, all my attempts have failed.
> ty
> 
> 
> Fernando Gomes wrote:
>> 
>&g

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Fernando Gomes

I tried this alternative, so I realized the PDF renderer has classes that
handle the image ..
put on it still can not do the same here, get all the images from a PDF -.-

and I would not like to mix several jars.

you have any examples of how to get the images using pdf renderer or pdfbox
or jpedal?




Mike Marchywka-2 wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> Date: Tue, 23 Feb 2010 06:52:54 -0800
>> From: fernandogomes...@hotmail.com
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] Using Images extracted from a pdf
>>
>>
>> can anyone help-me one more time..
>> i dont know what i do ..
>>
>> I need to get the image bytes, now decoded...
> 
> probably the open source pdf renderer would answer your questions and
> provide
> more context. I seem to recall it was pretty easy to modify to extract
> page images
> in your favorite format, probably in process of rendering the included
> images
> are extracted etc.
> 
> 
> 
> 
>>
>> String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString();
>>> String filter = pdfStrem.get(PdfName.FILTER).toString();
>>> int bits =
>>> Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString());
>>> int width =
>>> Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString());
>>> int height =
>>> Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString());
>>> PdfDictionary param =
>>> (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS);
>>> int colors =
>>> Integer.valueOf(param.get(PdfName.COLORS).toString());
>>> int predictor =
>>> Integer.valueOf(param.get(PdfName.PREDICTOR).toString());
>>> int colums =
>>> Integer.valueOf(param.get(PdfName.COLUMNS).toString());
>>> if(filter.equals("/FlateDecode"))
>>> {
>>> byte[] bytesDecod = PdfReader.FlateDecode(bytes);
>>
>> these are all the information that I can withdraw PDF
>>
>> I have to do to create my image in general ..
>> I'm trying to do, or learn, but this hard, all my attempts have failed.
>> ty
>>
>>
>> Fernando Gomes wrote:
>>>
>>> Sirs, really sorry for duplicating, can delete other topics ?
>>> so sorry ..:blush:
>>>
>>> very thkx for help..
>>> and so good fast help ..
>>> i will estudy more ..
>>>
>>>
>>> Leonard Rosenthol-3 wrote:
>>>>
>>>> You are assuming that PDF maintains the PNG nature of the image - that
>>>> is
>>>> NOT the case. PDF only supports two kinds of images JPEG (which is why
>>>> this works) and "raw bitmaps" (aka an array of bits). So in your case,
>>>> with the PNG, it is transcoded into the latter case and so if you want
>>>> it
>>>> back you will need to reverse the process on your end.
>>>>
>>>
>>>
>>> for this response in other same email :blush:
>>> quote of "1T3XT info" below ..
>>>
>>> really thanks. I must have seen the realance the chapter that you
>>> mentioned, I will read again and very carefully. My English is very
>>> weak,
>>> and it is very difficult to read.
>>>
>>> you are very funny, I laughed a lot. I know I deserved the scolding.
>>> Really thanks for your help. I will test and then come back to post the
>>> result.
>>> Thank you!
>>>
>>>
>>> 1T3XT info wrote:
>>>>
>>>> Fernando Henrique Gomes wrote:
>>>>> the problem is when I insert an image in PNG format and then try to
>>>>> get
>>>>> the same...
>>>>
>>>> OK, we're talking about a PNG.
>>>> If you've read chapter 10 of the 2nd edition of "iText in Action",
>>>> you know that PNGs are transformed into zipped pixels.
>>>> If you didn't know, you should read the book!
>>>>
>>>>> on here i try to take that image...
>>>>>
>>>>> [code]
>>>>> int XrefIndex =((PRIndirectReference)obj).getNumber();
>>>>> PdfObject pdfObj = pdf.getPdfObject(XrefIndex);
>>>>> PdfStream pdfStrem = (PdfStream)pdfObj;
>>>>> byte[] bytes =
>>>>> PdfReader.getStreamBytesRaw((PRStream)pdfStrem);
>>>>> if ((bytes != null)) {
>>>>> String fileName = "Image_P"+pageNumber+"_";
>>>>> File file = new File(

Re: [iText-questions] Using Images extracted from a pdf

2010-02-23 Thread Mike Marchywka











> Date: Tue, 23 Feb 2010 06:52:54 -0800
> From: fernandogomes...@hotmail.com
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] Using Images extracted from a pdf
>
>
> can anyone help-me one more time..
> i dont know what i do ..
>
> I need to get the image bytes, now decoded...

probably the open source pdf renderer would answer your questions and provide
more context. I seem to recall it was pretty easy to modify to extract page 
images
in your favorite format, probably in process of rendering the included images
are extracted etc.




>
> String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString();
>> String filter = pdfStrem.get(PdfName.FILTER).toString();
>> int bits =
>> Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString());
>> int width =
>> Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString());
>> int height =
>> Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString());
>> PdfDictionary param =
>> (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS);
>> int colors =
>> Integer.valueOf(param.get(PdfName.COLORS).toString());
>> int predictor =
>> Integer.valueOf(param.get(PdfName.PREDICTOR).toString());
>> int colums =
>> Integer.valueOf(param.get(PdfName.COLUMNS).toString());
>> if(filter.equals("/FlateDecode"))
>> {
>> byte[] bytesDecod = PdfReader.FlateDecode(bytes);
>
> these are all the information that I can withdraw PDF
>
> I have to do to create my image in general ..
> I'm trying to do, or learn, but this hard, all my attempts have failed.
> ty
>
>
> Fernando Gomes wrote:
>>
>> Sirs, really sorry for duplicating, can delete other topics ?
>> so sorry ..:blush:
>>
>> very thkx for help..
>> and so good fast help ..
>> i will estudy more ..
>>
>>
>> Leonard Rosenthol-3 wrote:
>>>
>>> You are assuming that PDF maintains the PNG nature of the image - that is
>>> NOT the case. PDF only supports two kinds of images JPEG (which is why
>>> this works) and "raw bitmaps" (aka an array of bits). So in your case,
>>> with the PNG, it is transcoded into the latter case and so if you want it
>>> back you will need to reverse the process on your end.
>>>
>>
>>
>> for this response in other same email :blush:
>> quote of "1T3XT info" below ..
>>
>> really thanks. I must have seen the realance the chapter that you
>> mentioned, I will read again and very carefully. My English is very weak,
>> and it is very difficult to read.
>>
>> you are very funny, I laughed a lot. I know I deserved the scolding.
>> Really thanks for your help. I will test and then come back to post the
>> result.
>> Thank you!
>>
>>
>> 1T3XT info wrote:
>>>
>>> Fernando Henrique Gomes wrote:
>>>> the problem is when I insert an image in PNG format and then try to get
>>>> the same...
>>>
>>> OK, we're talking about a PNG.
>>> If you've read chapter 10 of the 2nd edition of "iText in Action",
>>> you know that PNGs are transformed into zipped pixels.
>>> If you didn't know, you should read the book!
>>>
>>>> on here i try to take that image...
>>>>
>>>> [code]
>>>> int XrefIndex =((PRIndirectReference)obj).getNumber();
>>>> PdfObject pdfObj = pdf.getPdfObject(XrefIndex);
>>>> PdfStream pdfStrem = (PdfStream)pdfObj;
>>>> byte[] bytes =
>>>> PdfReader.getStreamBytesRaw((PRStream)pdfStrem);
>>>> if ((bytes != null)) {
>>>> String fileName = "Image_P"+pageNumber+"_";
>>>> File file = new File(fileName);
>>>> FileOutputStream fw = new FileOutputStream(file);
>>>> fw.write(bytes);
>>>> fw.flush();
>>>> fw.close();
>>>> BufferedImage img2 = ImageIO.read(file);
>>>> com.lowagie.text.Image img =
>>>> com.lowagie.text.Image.getInstance(file.toURL());
>>>> }
>>>> [/code]
>>>>
>>>> img2 returned a null 
>>>
>>> Of course, why do you think that would work???
>>>
>>>> in line of img .. has a Excpetion
>>>> "Image_P1_ is not a recognized imageformat"
>>>
>>> Of course, you're sending iText a bunch of pixels,
>>> but: what are the dimensions of the image,
>>> how many bits are there per component?
>>>
>>>> when i try to 

Re: [iText-questions] Using Images extracted from a pdf

2010-02-22 Thread Leonard Rosenthol
You are assuming that PDF maintains the PNG nature of the image - that is NOT 
the case.  PDF only supports two kinds of images JPEG (which is why this works) 
and "raw bitmaps" (aka an array of bits).  So in your case, with the PNG, it is 
transcoded into the latter case and so if you want it back you will need to 
reverse the process on your end.

-Original Message-
From: fernandogomes...@hotmail.com [mailto:fernandogomes...@hotmail.com] 
Sent: Monday, February 22, 2010 2:52 PM
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] Using Images extracted from a pdf

hi, sry for a bad english.. i working for better that.

i need a big help..
I managed with the help of tutorials and codes of this mail list,
insert and extract images from a PDF.

the problem is when I insert an image in PNG format and then try to get the 
same...

on here i make insert...
my file is an image whit PNG encode .
[code]
com.lowagie.text.Image img = com.lowagie.text.Image.getInstance(f.toURL());
if(img != null)
{   
document.setPageSize(new 
Rectangle(img.getScaledWidth(), img.getScaledHeight()));
document.newPage(); 

img.setAbsolutePosition(0, 0);
cb.addImage(img);
}

[/code]

on here i try to take that image...

[code]
int XrefIndex =((PRIndirectReference)obj).getNumber();
PdfObject pdfObj = pdf.getPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
byte[] bytes = 
PdfReader.getStreamBytesRaw((PRStream)pdfStrem);
if ((bytes != null))
{
String fileName = "Image_P"+pageNumber+"_";
 File file = new File(fileName);
 FileOutputStream fw = new 
FileOutputStream(file);
 fw.write(bytes);
 fw.flush();
 fw.close();

BufferedImage img2 = ImageIO.read(file);
com.lowagie.text.Image img = 
com.lowagie.text.Image.getInstance(file.toURL());

}
[/code]

img2 returned a null 
in line of img .. has a Excpetion
"Image_P1_ is not a recognized imageformat"

when i try to do :

[code]
Image image = Toolkit.getDefaultToolkit().createImage(bytes);
[code]

and before create an image from this image getting the width and height from my 
PdfStream 
(create a buffered and draw the image)
when i serialize on a file and visualize this.. this image in a fucking black 
picture .. all black -.- 


if i use JPEG encode for my images.. all the 3 solution i have .. its ok.. have 
effects..
i can vizualize my images how to i create then .. perfect..
but if i change de JPEG ... for any other encode.. thats not have efect .. 

can any help-me plz ?

--
This message was sent on behalf of fernandogomes...@hotmail.com at 
openSubscriber.com
http://www.opensubscriber.com/messages/itext-questions@lists.sourceforge.net/topic.html

--
Download IntelĀ® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

--
Download IntelĀ® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/