Leonard, send me a personal email with the proposal of iText ($) to solve this problem.
I think you already understand, but in short ... get all the images from a PDF and turn into BufferedImage. ty. Leonard Rosenthol-3 wrote: > > Text extraction is difficult too - it just so happens that someone has > written the code already and contributed it back, so that you can take > advantage of it. Before then, it was quite a nightmare. > > Now you have a requirement (for your job, I assume) and the code doesn't > exist so you need to write it. If you are unable to do so, due to lack of > knowledge, time, etc. then perhaps your company should consider hiring > iText Software to write it for you. I am sure they would be glad to it - > and in less time than you've already spent. > > The alternative - and there isn't any shortcut - is for you to learn about > PDF and image formats... > > Leonard > > -----Original Message----- > From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] > Sent: Tuesday, February 23, 2010 4:07 PM > To: itext-questions@lists.sourceforge.net > Subject: Re: [iText-questions] Using Images extracted from a pdf > > > i not want to transforme my pages in image.. > > its so.. > i will extract de text of my pdf.. > its be easy.. > > and in same action.. i will extract all images from this pdf.. > apply OCR in images.. for extract text of each image. > > so ... > i need so much to get all images into the pdf.. > i won to take the byte raw of image.. > now need transform that in a valid JAVA.AWT.IMAGE OR BUFFEREDIMAGE > > Mike Marchywka-2 wrote: >> >> >> >> >> >> You can always use the command line tool in pdf toolkit or xpf, >> I can't remember which but there is something like >> pdf2image similar to pdf2text to extract text. >> >> >> >> >> >> >> >> ---------------------------------------- >>> Date: Tue, 23 Feb 2010 12:43:28 -0800 >>> From: fernandogomes...@hotmail.com >>> To: itext-questions@lists.sourceforge.net >>> Subject: Re: [iText-questions] Using Images extracted from a pdf >>> >>> >>> I'm going crazy with it. as you can see, I never manipulated images as >>> low >>> level. and do not have much sense of how things work. I am searching for >>> a >>> days for end my solution. and I'm already getting stressed. >>> i going on test methods .. i try to do.. and before try by another >>> choice.. >>> -.- >>> >>> can you give me some more assistance on how I can turn this array of >>> bytes >>> back into an image? >>> >>> could have just one class of api that made it not? : P >>> >>> Pdfimages buf = new pdfimages (myRawImageByteArray); >>> buf.getAsBufferedImage (); >>> >>> : P >>> >>> if you say you can not help me all right, but I can indicate a content >>> in >>> which I can rely on to get this done? >>> >>> thanks. >>> >>> >>> Leonard Rosenthol-3 wrote: >>>> >>>> The image is decompressed and then "injected" into the PDF. Same with >>>> EVERY TYPE of image EXCEPT JPEG. >>>> >>>> -----Original Message----- >>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] >>>> Sent: Tuesday, February 23, 2010 3:21 PM >>>> To: itext-questions@lists.sourceforge.net >>>> Subject: Re: [iText-questions] Using Images extracted from a pdf >>>> >>>> >>>> ty .. >>>> >>>> I have a question. >>>> when I insert an image that is not jpeg >>>> what exactly happens with this? >>>> >>>> say that it is in PNG it is decompressed to be "injected" into PDF? >>>> >>>> or she keeps your PNG format, but the bytes are encoded with the >>>> FlateEncode >>>> .. >>>> >>>> a matter of finding the filter and decode do I get it. >>>> >>>> and if the image is uncompressed before being inserted to PDF, how do I >>>> know >>>> which type of encode the image? >>>> >>>> >>>> Leonard Rosenthol-3 wrote: >>>>> >>>>> Bits per pixel is the BitsPerComponent value in the image object >>>>> >>>>> Pixels per line (POR LINHA) is _NOT_ Width * bits. It's Width * >>>>> NumComponents, where NumComponents is based on the colorspace in >>>>> question >>>>> (eg. RGB == 3, CMYK == 4). >>>>> >>>>> -----Original Message----- >>>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] >>>>> Sent: Tuesday, February 23, 2010 2:00 PM >>>>> To: itext-questions@lists.sourceforge.net >>>>> Subject: Re: [iText-questions] Using Images extracted from a pdf >>>>> >>>>> >>>>> >>>>> >>>>>> public static BufferedImage createBufferedImageFromRawBytes(byte[] >>>>>> bytes,int width, int height, int bits) throws BadElementException, >>>>>> MalformedURLException, IOException { >>>>>> com.lowagie.text.Image img = >>>>>> com.lowagie.text.Image.getInstance(bytes); >>>>>> >>>>>> DataBuffer db = new DataBufferByte (img.getRawData(), >>>>>> img.getRawData().length); >>>>>> >>>>>> WritableRaster raster = Raster.createPackedRaster(db, //DATA BUFFER >>>>>> width, //LARGURA >>>>>> height, //ALTURA >>>>>> width*bits, //LARGURA * BITS POR PIXEL = PIXEL POR >>>>>> LINHA >>>>>> ->scanlineStride >>>>>> // bits, //BITS POR PIXEL ->pixelStride >>>>>> new int [] {bits}, >>>>>> >>>>>> null); >>>>>> >>>>>> ColorSpace cs = ColorSpace.getInstance (img.getColorspace()); >>>>>> ColorModel cm = new ComponentColorModel(cs, false, false, >>>>>> Transparency.OPAQUE, db.getDataType()); >>>>>> BufferedImage bi = new BufferedImage (cm, raster, false, null); >>>>>> return null; >>>>>> } >>>>>> >>>>>> >>>>> >>>>> this code is up to where I could get, but there are variables that I >>>>> know >>>>> of >>>>> to generate bufferedImage, please someone help me see if I'm on track. >>>>> If I write something wrong. >>>>> >>>>> >>>>> >>>>> Fernando Gomes wrote: >>>>>> >>>>>> can anyone help-me one more time.. >>>>>> i dont know what i do .. >>>>>> >>>>>> I need to get the image bytes, now decoded... >>>>>> >>>>>> String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString(); >>>>>>> String filter = pdfStrem.get(PdfName.FILTER).toString(); >>>>>>> int bits = >>>>>>> Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString()); >>>>>>> int width = >>>>>>> Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString()); >>>>>>> int height = >>>>>>> Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString()); >>>>>>> PdfDictionary param = >>>>>>> (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS); >>>>>>> int colors = >>>>>>> Integer.valueOf(param.get(PdfName.COLORS).toString()); >>>>>>> int predictor = >>>>>>> Integer.valueOf(param.get(PdfName.PREDICTOR).toString()); >>>>>>> int colums = >>>>>>> Integer.valueOf(param.get(PdfName.COLUMNS).toString()); >>>>>>> if(filter.equals("/FlateDecode")) >>>>>>> { >>>>>>> byte[] bytesDecod = PdfReader.FlateDecode(bytes); >>>>>> >>>>>> these are all the information that I can withdraw PDF >>>>>> >>>>>> I have to do to create my image in general .. >>>>>> I'm trying to do, or learn, but this hard, all my attempts have >>>>>> failed. >>>>>> ty >>>>>> >>>>>> >>>>>> Fernando Gomes wrote: >>>>>>> >>>>>>> Sirs, really sorry for duplicating, can delete other topics ? >>>>>>> so sorry ..:blush: >>>>>>> >>>>>>> very thkx for help.. >>>>>>> and so good fast help .. >>>>>>> i will estudy more .. >>>>>>> >>>>>>> >>>>>>> Leonard Rosenthol-3 wrote: >>>>>>>> >>>>>>>> You are assuming that PDF maintains the PNG nature of the image - >>>>>>>> that >>>>>>>> is NOT the case. PDF only supports two kinds of images JPEG (which >>>>>>>> is >>>>>>>> why this works) and "raw bitmaps" (aka an array of bits). So in >>>>>>>> your >>>>>>>> case, with the PNG, it is transcoded into the latter case and so if >>>>>>>> you >>>>>>>> want it back you will need to reverse the process on your end. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> for this response in other same email :blush: >>>>>>> quote of "1T3XT info" below .. >>>>>>> >>>>>>> really thanks. I must have seen the realance the chapter that you >>>>>>> mentioned, I will read again and very carefully. My English is very >>>>>>> weak, >>>>>>> and it is very difficult to read. >>>>>>> >>>>>>> you are very funny, I laughed a lot. I know I deserved the scolding. >>>>>>> Really thanks for your help. I will test and then come back to post >>>>>>> the >>>>>>> result. >>>>>>> Thank you! >>>>>>> >>>>>>> >>>>>>> 1T3XT info wrote: >>>>>>>> >>>>>>>> Fernando Henrique Gomes wrote: >>>>>>>>> the problem is when I insert an image in PNG format and then try >>>>>>>>> to >>>>>>>>> get >>>>>>>>> the same... >>>>>>>> >>>>>>>> OK, we're talking about a PNG. >>>>>>>> If you've read chapter 10 of the 2nd edition of "iText in Action", >>>>>>>> you know that PNGs are transformed into zipped pixels. >>>>>>>> If you didn't know, you should read the book! >>>>>>>> >>>>>>>>> on here i try to take that image... >>>>>>>>> >>>>>>>>> [code] >>>>>>>>> int XrefIndex =((PRIndirectReference)obj).getNumber(); >>>>>>>>> PdfObject pdfObj = pdf.getPdfObject(XrefIndex); >>>>>>>>> PdfStream pdfStrem = (PdfStream)pdfObj; >>>>>>>>> byte[] bytes = >>>>>>>>> PdfReader.getStreamBytesRaw((PRStream)pdfStrem); >>>>>>>>> if ((bytes != null)) { >>>>>>>>> String fileName = "Image_P"+pageNumber+"_"; >>>>>>>>> File file = new File(fileName); >>>>>>>>> FileOutputStream fw = new FileOutputStream(file); >>>>>>>>> fw.write(bytes); >>>>>>>>> fw.flush(); >>>>>>>>> fw.close(); >>>>>>>>> BufferedImage img2 = ImageIO.read(file); >>>>>>>>> com.lowagie.text.Image img = >>>>>>>>> com.lowagie.text.Image.getInstance(file.toURL()); >>>>>>>>> } >>>>>>>>> [/code] >>>>>>>>> >>>>>>>>> img2 returned a null !!!! >>>>>>>> >>>>>>>> Of course, why do you think that would work??? >>>>>>>> >>>>>>>>> in line of img .. has a Excpetion >>>>>>>>> "Image_P1_ is not a recognized imageformat" >>>>>>>> >>>>>>>> Of course, you're sending iText a bunch of pixels, >>>>>>>> but: what are the dimensions of the image, >>>>>>>> how many bits are there per component? >>>>>>>> >>>>>>>>> when i try to do : >>>>>>>>> [code] >>>>>>>>> Image image = Toolkit.getDefaultToolkit().createImage(bytes); >>>>>>>>> [code] >>>>>>>>> >>>>>>>>> and before create an image from this image getting the width and >>>>>>>>> height >>>>>>>>> from my PdfStream (create a buffered and draw the image) >>>>>>>>> when i serialize on a file and visualize this.. this image in a >>>>>>>>> fucking >>>>>>>>> black picture .. all black -.- >>>>>>>> >>>>>>>> It's because you don't have a fucking clue about what you're doing >>>>>>>> :P >>>>>>>> Hehe, I was waiting for an occasion to use the F* word on the list. >>>>>>>> Thanks! >>>>>>>> >>>>>>>>> if i use JPEG encode for my images.. all the 3 solution i have .. >>>>>>>>> its >>>>>>>>> ok.. have effects.. >>>>>>>> >>>>>>>> Well, that's because iText stores JPEGs literally as a JPEG without >>>>>>>> changing any of the bytes. If you look inside, you'll see that the >>>>>>>> filter is DCTDecode (Discrete Cosine Transform). >>>>>>>> >>>>>>>>> i can vizualize my images how to i create then .. perfect.. >>>>>>>>> but if i change de JPEG ... for any other encode.. thats not have >>>>>>>>> efect >>>>>>>>> .. >>>>>>>> >>>>>>>> No idea what you're saying here, but you also need to study images. >>>>>>>> >>>>>>>>> can any help-me plz ? >>>>>>>> >>>>>>>> This example doesn't involve iText, but explains what you're >>>>>>>> missing. >>>>>>>> >>>>>>>> Let's create an image byte per byte: >>>>>>>> >>>>>>>> byte b[] = new byte[256 * 3]; >>>>>>>> for (int i = 0; i < 256; i++) { >>>>>>>> b[i * 3] = (byte) (255 - i); >>>>>>>> b[i * 3 + 1] = (byte) (255 - i); >>>>>>>> b[i * 3 + 2] = (byte) i; >>>>>>>> } >>>>>>>> >>>>>>>> This is how a PNG, GIF, and some other image types are stored >>>>>>>> in a PDF, but in zipped format (FlateDecode). These bytes don't >>>>>>>> make any sense if you don't know the bpc, color space and >>>>>>>> dimensions. >>>>>>>> >>>>>>>> If you want to create an image from this bytes, you could do this: >>>>>>>> >>>>>>>> DataBuffer db = new DataBufferByte(b, b.length); >>>>>>>> WritableRaster raster = Raster.createInterleavedRaster( >>>>>>>> db, 16, 16, 48, 3, new int[]{0,1,2}, null); >>>>>>>> ColorSpace cs = ColorSpace.getInstance(ColorSpace.CS_sRGB); >>>>>>>> ColorModel cm = new ComponentColorModel( >>>>>>>> cs, false, false, Transparency.OPAQUE, DataBuffer.TYPE_BYTE); >>>>>>>> BufferedImage bi = new BufferedImage(cm, raster, false, null); >>>>>>>> ImageIO.write(bi, "bmp", new File("hello.bmp")); >>>>>>>> >>>>>>>> In this example, I treat the image as 16 x 16 pixels, using RGB, >>>>>>>> and converting it to a Bitmap. It's up to you to adapt the example >>>>>>>> if your image is a GrayScale or CMYK image, or if you want another >>>>>>>> format. >>>>>>>> >>>>>>>> (And please don't post the same question multiple times!!!) >>>>>>>> -- >>>>>>>> This answer is provided by 1T3XT BVBA >>>>>>>> http://www.1t3xt.com/ - http://www.1t3xt.info >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> Download Intel® Parallel Studio Eval >>>>>>>> Try the new software tools for yourself. Speed compiling, find bugs >>>>>>>> proactively, and fine-tune applications for parallel performance. >>>>>>>> See why Intel Parallel Studio got high marks during beta. >>>>>>>> http://p.sf.net/sfu/intel-sw-dev >>>>>>>> _______________________________________________ >>>>>>>> iText-questions mailing list >>>>>>>> iText-questions@lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>>>>>>> >>>>>>>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>>>>>>> Check the site with examples before you ask questions: >>>>>>>> http://www.1t3xt.info/examples/ >>>>>>>> You can also search the keywords list: >>>>>>>> http://1t3xt.info/tutorials/keywords/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/Using-Images-extracted-from-a-pdf-tp27693711p27708516.html >>>>> Sent from the iText - General mailing list archive at Nabble.com. >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Download Intel® Parallel Studio Eval >>>>> Try the new software tools for yourself. Speed compiling, find bugs >>>>> proactively, and fine-tune applications for parallel performance. >>>>> See why Intel Parallel Studio got high marks during beta. >>>>> http://p.sf.net/sfu/intel-sw-dev >>>>> _______________________________________________ >>>>> iText-questions mailing list >>>>> iText-questions@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>>>> >>>>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>>>> Check the site with examples before you ask questions: >>>>> http://www.1t3xt.info/examples/ >>>>> You can also search the keywords list: >>>>> http://1t3xt.info/tutorials/keywords/ >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Download Intel® Parallel Studio Eval >>>>> Try the new software tools for yourself. Speed compiling, find bugs >>>>> proactively, and fine-tune applications for parallel performance. >>>>> See why Intel Parallel Studio got high marks during beta. >>>>> http://p.sf.net/sfu/intel-sw-dev >>>>> _______________________________________________ >>>>> iText-questions mailing list >>>>> iText-questions@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>>>> >>>>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>>>> Check the site with examples before you ask questions: >>>>> http://www.1t3xt.info/examples/ >>>>> You can also search the keywords list: >>>>> http://1t3xt.info/tutorials/keywords/ >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Using-Images-extracted-from-a-pdf-tp27693711p27709815.html >>>> Sent from the iText - General mailing list archive at Nabble.com. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Download Intel® Parallel Studio Eval >>>> Try the new software tools for yourself. Speed compiling, find bugs >>>> proactively, and fine-tune applications for parallel performance. >>>> See why Intel Parallel Studio got high marks during beta. >>>> http://p.sf.net/sfu/intel-sw-dev >>>> _______________________________________________ >>>> iText-questions mailing list >>>> iText-questions@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>>> >>>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>>> Check the site with examples before you ask questions: >>>> http://www.1t3xt.info/examples/ >>>> You can also search the keywords list: >>>> http://1t3xt.info/tutorials/keywords/ >>>> >>>> ------------------------------------------------------------------------------ >>>> Download Intel® Parallel Studio Eval >>>> Try the new software tools for yourself. Speed compiling, find bugs >>>> proactively, and fine-tune applications for parallel performance. >>>> See why Intel Parallel Studio got high marks during beta. >>>> http://p.sf.net/sfu/intel-sw-dev >>>> _______________________________________________ >>>> iText-questions mailing list >>>> iText-questions@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>>> >>>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>>> Check the site with examples before you ask questions: >>>> http://www.1t3xt.info/examples/ >>>> You can also search the keywords list: >>>> http://1t3xt.info/tutorials/keywords/ >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Using-Images-extracted-from-a-pdf-tp27693711p27710159.html >>> Sent from the iText - General mailing list archive at Nabble.com. >>> >>> >>> ------------------------------------------------------------------------------ >>> Download Intel® Parallel Studio Eval >>> Try the new software tools for yourself. Speed compiling, find bugs >>> proactively, and fine-tune applications for parallel performance. >>> See why Intel Parallel Studio got high marks during beta. >>> http://p.sf.net/sfu/intel-sw-dev >>> _______________________________________________ >>> iText-questions mailing list >>> iText-questions@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>> >>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>> Check the site with examples before you ask questions: >>> http://www.1t3xt.info/examples/ >>> You can also search the keywords list: >>> http://1t3xt.info/tutorials/keywords/ >> >> _________________________________________________________________ >> Hotmail: Powerful Free email with security by Microsoft. >> http://clk.atdmt.com/GBL/go/201469230/direct/01/ >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> iText-questions mailing list >> iText-questions@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/itext-questions >> >> Buy the iText book: http://www.1t3xt.com/docs/book.php >> Check the site with examples before you ask questions: >> http://www.1t3xt.info/examples/ >> You can also search the keywords list: >> http://1t3xt.info/tutorials/keywords/ >> >> > > -- > View this message in context: > http://old.nabble.com/Using-Images-extracted-from-a-pdf-tp27693711p27710517.html > Sent from the iText - General mailing list archive at Nabble.com. > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > -- View this message in context: http://old.nabble.com/Using-Images-extracted-from-a-pdf-tp27693711p27714164.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/