Re: [iText-questions] NPE while Extracting text
Date: Mon, 21 Jun 2010 09:49:44 +0100 From: b...@benshort.co.uk To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] NPE while Extracting text Thanks very much for this information. Maybe you could offer me some direction of how to solve my problem? I need to parse pdf mobile phone bills. the information i require is the itemized data that is in a table format. Is this possible with itextpdf? I know this won't help you but let's be clear- pdf is NOT the format of choice for DATA or INFORMATION. It is generally about human readability- and while this often has a describable structure, everyone here tells me it is too complicated to include that in the PDF file. If you have a choice, and have a cooperative relationship with the source of the documents, you want an INFORMATION format, not a bunch of pixels. Scraping html or pdf is often done by people trying to extract information from artwork but you always need to make assumptions about the document structure. If you want a robust means to do this, at least workout some conventions with the document authors. The great leap in information representation in going from pictures to an alphabet is that fonts don't matter. You probably want to extract the text and scrap the font stuff. If text can not be extracted easily from the PDF itself, you need to reduce it to pixels and then extract with OCR software. Or, get the document author to only include the important stuff to begin with. On 19 June 2010 08:44, 1T3XT info wrote: Ben Short wrote: subType is /Type3 Does this help identify the problem? Yes, but it doesn't bring us closer to a solution. Type 3 fonts are user defined fonts. See for instance: http://itextpdf.com/examples/index.php?page=exampleid=200 In that example, a 'delta' and 'sigma' shaped glyph was defined, corresponding with the characters 'D' and 'S'. However, the example would also have worked if we'd used any other character. Another example: we could define a glyph that looks like the symbol for 'The Artist Formerly Known As Prince' to correspond with the character 'P'. That's what Type 3 fonts are about: they can be used when a user needs a glyph that isn't provided in any other font. Therefore it's very hard to extract that content: how are you going to know that the glyph corresponding with 'P' needs to be 'translated' to 'The Artist Formerly Known As Prince'? I don't think there's a UNICODE code point for that glyph. I think you've hit a limitation regarding text extraction in general. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords
Re: [iText-questions] iText Optimization
Date: Fri, 11 Jun 2010 22:55:53 -0700 From: thanga...@gmail.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] iText Optimization Hi, I have pasted a screen shot of CPU profiling for JasperReports. The report takes about 9 seconds to generate, 0.5 seconds of that is spent instantiating PdfGraphics2D. http://i.imgur.com/arzCI.png First, any optimization relies on getting the right data structs and representations. This of course is the first half of data structures and algorithms thinking that precedes local coding and implementation optimizations. You pointed us to a 61k image when a few lines of ASCII text would have been more readable ( I'm still fiddling with eog zoom LOL) and more portable and versatile for automated analysis other places ( lets say I wanted to import these results into a bash script and use as a benchmark against alt codes). This approach is my biggest concern with people in the PDF community, focusing on pictures rather than information. There is nothing wrong with human readability but just because you have a bloated picture in a standard format doesn't mean you have added any utility to your output. In this case a nice ASCII table would better serve the purpose of information sharing. Something to think about when you make your next work of art that obscures information. Briefly, I noticed a few things: 1. No lazy initialization is being used. Often with long complicated things, you can do some thinking up front and pick a strategy before doing anything. This would involve looking at whatever input parameters are cheap and easy to examine, estimating some parameters and then initializing everything up front with optimal memory maps etc. I sympathize with your concern, as I have seen code that takes longer to startup then execute, but in this case if you expect to do something that takes a while you can do some order zero thinking that will payoff in the inner loops. 2. Two instances of AffineTransform() are created (IDENTITY constant and one in the constructor). 3. Redundant instance variable assignments to false. I find myself doing this al the time, knowing it is a waste. Not sure why. LOL. [...] If you change the code around to use an instance variable (by uncommenting the commented lines and making the appropriate change in the inner loop), you'll see at least an order of magnitude increase in speed. (This is because the JVM uses a longer bytecode to address a class variable than it does a local scope variable; I don't think the JIT will optimize it.) There could be several factors here, but probably the issue is memory coherence. IIRC, results that are not observable don't have to be published by being written to main memory, If you want another thread to see what you are doing, then you need to use the member and call it volatile. Otherwise, explicit use of a local is more likely to stop the JRE from doing less predictable memory accesses- it is already using the stack a lot. Even if this is all hotspot compiled, the local is likely to create stack relative code. You could argue that well the members should be in a low level cache somewhere since well written code is likely to have most variable references to this but that may not help with larger objects etc. The doAttributes() method, for example, will suffer a little because of this. Dave _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] MalformedByteSequenceException
on quick look I'm not sure what you expect this to have to do with itext. That is, it isn't obvious the XML came from a pdf file or was created by itesxt etc. Did you look at the bytes going into the parser? In any case it would probably be obvious if you dumped these bytes and had some idea what the method expected as valid bytes. Is DocumentBuilder something to do with pdf? doc = builder.parse(new StringBufferInputStream(sourceURL)); Its early I mady have this wrong but maybe more direct relationship it itext would help. To: itext-questions@lists.sourceforge.net From: kishore.chitt...@tcs.com Date: Thu, 10 Jun 2010 11:16:08 + Subject: [iText-questions] MalformedByteSequenceException Hi, I am getting malformed exception when i am generating PDF. Please refer the sample code and exception stack trace below. Can you please let me know what needs to be done. Exception com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 3 of 3-byte UTF-8 sequence. at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte (UTF8Reader.java:674) at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read (UTF8Reader.java:425) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load (XMLEntityScanner.java:1742) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanContent (XMLEntityScanner.java:916) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentC ontentDriver.next(XMLDocumentFragmentScannerImpl.java:2773) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next (XMLDocumentScannerImpl.java:647) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocum ent(XMLDocumentFragmentScannerImpl.java:508) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse (XML11Configuration.java:807) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse (XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse (XMLParser.java:107) at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse (DOMParser.java:225) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse (DocumentBuilderImpl.java:283) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124) at ExportAlltabsToPdfAction.main(ExportAlltabsToPdfAction.java:61) --- Sample Code: import java.io.DataInputStream; import java.io.FileInputStream; import java.io.IOException; import java.io.PrintWriter; import java.util.*; import java.net.*; import java.io.*; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStream; import java.io.StringBufferInputStream; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; import org.xhtmlrenderer.pdf.ITextRenderer; import org.xhtmlrenderer.resource.FSEntityResolver; //import com.lowagie.text.DocumentException; public class ExportAlltabsToPdfAction { public static void main(String args[])throws Exception { try { StringBuffer inputFile = new StringBuffer(); URL yahoo = new URL (http://dfte.ual.com/wiki/index.php/Main_Page;); BufferedReader in = new BufferedReader( new InputStreamReader( yahoo.openStream ())); String inputLine; while ((inputLine = in.readLine ()) != null){ inputFile.append (inputLine); } in.close(); String outputFile = ; outputFile = C:\\AllTabs.pdf; OutputStream os = new FileOutputStream(outputFile); DocumentBuilder builder; builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); FSEntityResolver er= FSEntityResolver.instance(); builder.setEntityResolver(er); Document doc; String sourceURL = inputFile.toString(); System.out.println(sourceURL); doc = builder.parse(new StringBufferInputStream(sourceURL)); ITextRenderer renderer = new ITextRenderer(); renderer.setDocument(doc, null); renderer.layout(); renderer.createPDF(os); os.close(); } catch (Exception e) { e.printStackTrace(); } } } -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list:
Re: [iText-questions] MalformedByteSequenceException
To: itext-questions@lists.sourceforge.net From: kishore.chitt...@tcs.com Date: Thu, 10 Jun 2010 17:30:36 +0530 Subject: Re: [iText-questions] MalformedByteSequenceException Hi, Can you please guide me how to resolve this issue. Even though it is not related with itext. The fact that it isn't related to itext doesn't stop people from responding, however you are likely to get responses like hire a programmer at this point :) Know one here knows what your input data looks like, you need to validate it in any case for a real app - never assume another server returns good stuff. Thanks Regards Kishore CH Experience certainty. IT Services Business Solutions _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] iText Perfomance Issue on WebLogic 9.2
Date: Fri, 4 Jun 2010 12:51:07 +0200 From: klas.lindb...@val.se To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText Perfomance Issue on WebLogic 9.2 One obvious thing to look at is physical memory and paging. I have a hunch that WLS is more memory-consuming than Tomcat leaving less for iText which may cause paging to occur. but memory is cheap and my disk is very very very very fast. LOL. This is a problem with almost all apps today, my current frustration is browsers ( not just PDF files any more LOL). I'm not even sure if there are good diagnostic here other than looking at page faults on task manager. I've learned to start swearing once my disk light comes on and I'm not doing overt file IO. LOL Also, profiling was suggested, and I agree that it is a very good idea to help pinpoint the source of the problem. /Klas _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] iText Perfomance Issue on WebLogic 9.2 MP3
( extra space courtesy of hotmail tired of editing it out) Date: Thu, 3 Jun 2010 08:55:14 -0700 From: msto...@autonomy.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText Perfomance Issue on WebLogic 9.2 MP3 That’s the first I’ve heard of it. Can you profile it to see what’s taking so long? Also if you want to do comparative tests perhaps some analysis of your control group would help. --Mark Storer Senior Software Engineer Cardiff.com import legalese.Disclaimer; Disclaimer DisCard = null; From: George Li [mailto:g...@varicent.com] Sent: Wednesday, June 02, 2010 3:21 PM To: itext-questions@lists.sourceforge.net Subject: [iText-questions] iText Perfomance Issue on WebLogic 9.2 MP3 Hi, I have a PDF exporting service hosted on WebLogic 9.2 MP3. I find that the Document.add(Element) call is 2-3 times slower than when the PDF service is hosted on Tomcat (on the same machine), especially if the element is huge one such as a table of 2000 rows. Is there any fix for this problem? No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.819 / Virus Database: 271.1.1/2910 - Release Date: 06/02/10 02:57:00 _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi
From: psoa...@glintt.com To: itext-questions@lists.sourceforge.net Date: Mon, 31 May 2010 11:21:30 +0100 Subject: Re: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi Why do you think iText is wrong? Post the images for inspection. Paulo -Original Message- From: dermoritz [mailto:tantea...@hotmail.com] Sent: Monday, May 31, 2010 11:17 AM To: itext-questions@lists.sourceforge.net Subject: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi i have some Problems with image.getDpiX/Y (iText 5.0.2) on some Images. This Images come from different digital camaras and WinExplorer shows for all of the 240dpi. I rotated one of them 90° via the WindowsXP build in image viewer. For this rotated image getDpi returns 96dpi! But still WinExplorer an Acrobat and MS-Office Picture Manager show 240dpi for all Images. So why iText cant't get correct dpi from them? You could probably get something like imagemagick and use identify to dump the metadata for the images that work and those that don't. I don't know formats well enough to know what is posible but looking at stuff I have from various phnes/cameras, it seems there is a resolution entry but also something called Exif thta contains more res info. I guess it is possible that some images only have it in a non standard but well known location. Aviso Legal: Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem. Disclaimer: This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, please send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi
Date: Mon, 31 May 2010 04:44:47 -0700 From: tantea...@hotmail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi thx, but to look for myself for correct meta-data is no option. in this LOL case i have to decide what is the correct resolution or what is the most credible place to look for resolution. the only thing i know is that iText don't shows the correct resolution (but probably looking at correct place for it) and many other programs showing the correct resolution (probably looking at well known places). can anyone tell me where iText looks for dpi? and does anybody know where all those other programs look for it? - i think all programs only look in one place?! Generally you want to get the most direct or definitive result you can. Getting bunch of IIRC usually just leads to more problems. It may be a simple matter to just look with a command line tool rather than getting human input that may or may not help. So ok you find out that itext looks in place A and product B looks in place C then what? Usually it is just easier if you want to say itext doesn't work to have some specific case under which it appears to fail. if you could run identify on the bad images that may help and the answer may even be evident ( itext forgot to divide by blah). IF the people you are asking don't know the answer they will need to do all this anyway. -- View this message in context: http://itext-general.2136553.n4.nabble.com/getDpiX-Y-returns-0-or-wrong-value-but-WinExplorer-and-Acrobate-get-Dpi-tp2237115p2237210.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi
Date: Mon, 31 May 2010 04:53:45 -0700 From: tantea...@hotmail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] getDpiX, Y returns 0 or wrong value but WinExplorer and Acrobate get Dpi edit: i just checked some of the images here: http://www.fileformat.info/convert/image/identify.htm (uses imageMagick) for all images Resolution: 240x240 is shown! I never hit info links but you may need to run with -verbose to get all the metadata detailed. This is resolution ( as in DPI) and not pixel dimension? -- View this message in context: http://itext-general.2136553.n4.nabble.com/getDpiX-Y-returns-0-or-wrong-value-but-WinExplorer-and-Acrobate-get-Dpi-tp2237115p2237219.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Spam: Unit testing flattened PDFs
[ after our latest SMTP exchange, notice what hhotmail does with just splain text LOL... I'm not sure anyone even tests this stuff ] Date: Fri, 28 May 2010 09:03:04 -0700 From: msto...@autonomy.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Spam: Unit testing flattened PDFs Unit testing PDF is Notoriously Difficult. For just plain pixel compares, I've suggested this before but if you are really stuck and have resources, consider something like instrumented video compression libraries. That is, the compression relies on isolating things of perceptual interest, like motion vectors for example. Now, ideally if you could get a result that says this block is moved over between the two frames that might be the metric you want. Ideally, you’d save the coordinates of your various fields and run OCR on your resulting flattened PDF, looking for the correct text in the correct place. Well, presumably you have the fonts that you could render ( ex ligatures etc) and you could just look for pixel blocks that match, this is a lot easier than general OCR with unknown fonts or sizes ( if you can't estimate these a priori you are stuck LOL). Most people who make stuff up have a model described SOMEWHERE even if they have to absolutely positively remove every trace of it before publishing their standard professional result. It isn't entirely cheating to use this for testing but you can appreciate how useful it is to those of us who get stuck using your pixel creation too. Realistically? Umm… ouch. Actually, the pdf.parser.PdfTextExtractor could be Quite Helpful. Yeah… ! Check out SimpleTextExtractingPdfContentStreamProcessor. With a name like that, it must be easy, right? _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Blank PDF after it is transfered through SMTP
Date: Wed, 26 May 2010 23:53:55 -0400 From: jgs...@gmail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Blank PDF after it is transfered through SMTP Michael, Your points are well-taken. Michael wrote: This means that your quoted-printable encoder does not do a thorough job, either because it is buggy or because you have not told it that the data to encode is not text where a single carriage return, a single line feed, and a carriage return line feed combination all mean the same. This is what I am suspecting as well. Strangely another pdf file (generated by JClass from Quest Software) does not have inflated bytes issue, although it went through exactly the same Java mail code I posted. I will dig further on it. Did anyone see my earlier link? Unlikely doesn't mean MUST and I can assure the PDF is not human readable without a special decoder ring or PDF viewer LOL, http://tools.ietf.org/rfc/rfc2045.txt 6.7. Quoted-Printable Content-Transfer-Encoding The Quoted-Printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the US-ASCII character set. It encodes the data in such a way that the resulting octets are unlikely to be modified by mail transport. If the data being encoded are mostly US-ASCII text, the encoded form of the data remains largely recognizable by humans. If you keep reading, Note that many implementations may elect to encode the local representation of various content types directly rather than converting to canonical form first, encoding, and then converting back to local representation. In particular, this may apply to plain text material on systems that use newline conventions other than a CRLF terminator sequence. Such an implementation optimization is permissible, but only when the combined canonicalization-encoding step is equivalent to performing the three steps separately. Regards, Jiangang Song _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Blank PDF after it is transfered through SMTP
or if yo uread even more, clearly this is for human readable content not binary data. Why are you trying this? It's unlikely that the actual impl would even care about preserving data against some transformations that could occur by mail anyway, Because quoted-printable data is generally assumed to be line- oriented, it is to be expected that the representation of the breaks between the lines of quoted-printable data may be altered in transport, in the same manner that plain text mail has always been altered in Internet mail when passing between systems with differing newline conventions. If such alterations are likely to constitute a Freed Borenstein Standards Track[Page 21] RFC 2045Internet Message BodiesNovember 1996 corruption of the data, it is probably more sensible to use the base64 encoding rather than the quoted-printable encoding. _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Blank PDF after it is transfered through SMTP
Date: Thu, 27 May 2010 09:16:38 -0400 From: jgs...@gmail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Blank PDF after it is transfered through SMTP Michael, I attached a pdf (generated by non-iText software) which was received through email after SMTP transfer using quoted-printable. I still don't get why you insist on using this approach given what the IETF says about it or what this has to do with itext. The general intent is to encode human readable information (ASCII) such that it is not modified in a way likely to matter to an intelligent human.The encoding format is designed to make the encoded file human readable, presumably reflecting human readable target data. Are you just suggesting that itext should support dos and linux line endings? Is a viewer expressing a preference for one or the other? Do you have a pdf file recieved after going through a profanity and patriotism scanner too? I'm genuinely curious now, inquring minds want to know. I opened it using Textpad by binary mode. It constistently uses 0A as eol and contains no 0D. I understand that PDF 1.4 spec does not require such a consistency for eol. However, it could be the reason that Java mail transfer encoder messed up. I will dig more. I realized that I mentioned a commercial pdf software name. It is not intentional. I sincerely appologize if it bothers anyone. Acurate and relevant information can't bother anyone - do you want us to guess? LOL. Regards, Jiangang Song On Wed, May 26, 2010 at 11:53 PM, Jiangang Song wrote: Michael, Your points are well-taken. Michael wrote: This means that your quoted-printable encoder does not do a thorough job, either because it is buggy or because you have not told it that the data to encode is not text where a single carriage return, a single line feed, and a carriage return line feed combination all mean the same. This is what I am suspecting as well. Strangely another pdf file (generated by JClass from Quest Software) does not have inflated bytes issue, although it went through exactly the same Java mail code I posted. I will dig further on it. Regards, Jiangang Song _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Blank PDF after it is transfered through SMTP
Then you’ve been REALLY LUCKY – since QP and PDF have never gotten along. I guess my interest here was in determining how primitive a valid PDF would have to be if it was assured of being ASCII, if that is even possible. If you could write out PDF's with such a constraint they may work better with some other tools but obviously you'd expect to drop many things and make files even bigger ( which is often fine for some intermediate things like object files) . From: Jiangang Song I appreciate your response. And I am well-known of the RFC spec before I post. We have been using Quoted-printable to transfer PDF for the past 10 years. I thought my question had just missed something obvious but I guess if you'd even looked at it you would done a binary diff first, found the missing high bits and CRLF issues and made histograms of historical PDF files and found no bad cahrs and recognized the problem and then just asked if itext can generate pure ASCII pdf files. Not an unreasaonble question, see comments above.This is a reason however not to always use the latest and greatest features, sometimes they just don't work with the existing stuff. _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Question about converting HTML to PDF
Date: Wed, 26 May 2010 18:30:42 +0300 From: dhryvas...@serena.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Question about converting HTML to PDF Hi all - I am using iText and I try to convert HTML to PDF using this library. It works well for me with simple HTML. The question that I have is: does iText support converting HTML which includes css references and JavaScript to PDF? If I have JavaScript embedded in my HTML will it work in the generated PDF? Is it a way to do this? There is a webkit app that does this, http://code.google.com/p/wkhtmltopdf/ I guess I would ask a related question that seems to be answered by the above app, has anyone considered using itext or other tools with browsers such as webkit? Presumably webkit, as an example, knows how to render ( by definition almost it is right as usually people want to copy what they see on [ some ] browser even if there is not standard that captures quirks and bugs LOL ). Apparently webkit generates things like DOM's and structures for drawing, you could consider several ways to interface or mix and match tools. For example, some JNI interface between your java app and a modified webkit built OR an intermediate language such that this would do something useful: webkit_tool -dump_render_tree http://xxx.com | java -jar my_itext_app pdf_of_web.pdf I will admit right now that much like grep the source code was my prior answer for everything, webkit ( an opensource browser) seems like the answer to everything that involves a browser. Thanks in advance, Denys _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Image Speed
Although I looked through a few different threads, I couldn't find anything that answered my exact problem (apologies if I missed something). I am creating a PDF document that needs to support many images - upwards of 20 unique images. Generating the PDF takes ~1 second per image (my testing determined images are the bottleneck), which is a problem as I need to generate these PDFs dynamically, and this is just too long to wait :( ... Here is the code I am using for each image: image = Image.getInstance(url); image.setAbsolutePosition(left, top - depth); image.scaleAbsolute(width, depth); d.add(image); About half of the time per image is consumed by image = Image.getInstance(url); - unfortunately, I don't think there's an alternative here, however if anyone has a faster way of doing this, that'd be appreciated. Does this point to some other machine? Or a local file on disk somewhere? You can of course cache these or maybe compress but in any case this is not a PDF problem if limited by IO. However, the other time consumption is due to actually adding the images to the document - my question is if there's a way to speed up this step. Of course, if this is the best performance I can expect from iText, that would be great to know too, so that I can start looking into other PDF libraries. Presumably you'd like to get some indication that the other libraries could be faster by determing that there is a better code or algorithm alternative to that used by itext ( native code may be faster too). It could just be that the task takes a lot of work. Often, however, you find things like memory usage rather than instruction count become the limiting issues- if you have lots of big images, VM will still thrash them around unless you happen to get lucky. In this case you could probably get tremenduous speed ups by only keeping what you need in memory and writing completed stuff out as it is ready etc. The answer of course is that you should dig down deeper and see what the issue is more precisely. An itext person with knowledge of the code may have some suspects but you also could have some quirk specific to your system that causes bigger problems. _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Blank PDF after it is transfered through SMTP
From: js...@hotmail.com To: itext-questions@lists.sourceforge.net Date: Tue, 25 May 2010 18:37:45 -0400 Subject: Re: [iText-questions] Blank PDF after it is transfered through SMTP Thank you for pointing out shave bytes. In fact, this time it is inflated bytes. Comparing the pdf file generated directly and the one transferred through SMTP using content-transfer-encoding: quoted-printable, all 0A is inflated to 0D 0A and all 0D is also inflated to 0D 0A. There is no other difference. Just this minor inflation blows up acrobat reader and it shows up as blank pdf. (There is no such inflation if base64 is used as content-transfer-encoding.) What exactly are these characters? Why might this make sense with some data types? So the pdf generated by iText contains either 0A or 0D but not 0D 0A together. Is this by design? Or is it configurable? I guess if it did this consistently, you could use dos2unix or sed to fix the file. P.S.: all test is on Windows platform. Attached page_numbers.pdf is generated directly and test.pdf is received through email as described above using quoted-printable encoding. See this for example, learn to use ietf for these types of issues or other standard groups, http://tools.ietf.org/rfc/rfc2045.txt 6.6. Canonical Encoding Model There was some confusion, in the previous versions of this RFC, regarding the model for when email data was to be converted to canonical form and encoded, and in particular how this process would affect the treatment of CRLFs, given that the representation of newlines varies greatly from system to system, and the relationship between content-transfer-encodings and character sets. A canonical model for encoding is presented in RFC 2049 for this reason. 6.7. Quoted-Printable Content-Transfer-Encoding The Quoted-Printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the US-ASCII character set. It encodes the data in such a way that the resulting octets are unlikely to be modified by mail transport. If the data being encoded are mostly US-ASCII text, the encoded form of the data remains largely recognizable by humans. A body which is entirely US-ASCII may also be encoded in Quoted-Printable to ensure the integrity of the data should the message pass through a character-translating, and/or line-wrapping gateway. Date: Mon, 24 May 2010 18:00:38 +0200 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Blank PDF after it is transfered through SMTP Jiangang Song wrote: Or is there anything wrong with my usage of Java mail? The blank page problem is (as documented in the book) caused by the fact that some applications (such as Java mail?) shave bytes. PDF is a binary file format. You need to transfer it as a binary file. If you open up the PDF with the shaved bytes in a text editor, you'll see that there are lots of question marks. Those are bytes that have lost a bit due to the way you've transferred them. You need to make sure that you transfer the file as a binary file. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get started. _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Image Speed
Date: Tue, 25 May 2010 16:39:42 -0700 From: jdebr...@gmail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Image Speed Does this point to some other machine? Or a local file on disk somewhere? This URL points to a photo hosted by our photo server - this is the only way to access these images unfortunately. However, do you think it would help to compress the image somehow before calling image.getInstance()? Caching does no good as there will very rarely be repetition in the images used :( ... unlikely to be due to transfer time, could even be disk IO on server but only a non-itext related suspect. _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Blank PDF after it is transfered through SMTP
From: js...@hotmail.com To: itext-questions@lists.sourceforge.net Date: Mon, 24 May 2010 11:54:29 -0400 Subject: [iText-questions] Blank PDF after it is transfered through SMTP I tried to send generated PDF through SMTP using Java mail api. It puzzled me that the content of PDF once received in email is blank unless the Content-Transfer-Encoding is set to base64. For example, Do you have any idea what base64 encoding would do? That may be a good place to start. You could for example extract the text of your pdf and probably send that without complication ( shameless taunt for response LOL ). What do you mean by blank? You opened it in viewer and got blank page or zero length file? Does iText support other Content-Transfer-Encoding like quoted-printable? Or is there anything wrong with my usage of Java mail? Only by accident would anyone here know anything about mail or SMTP. iText supports PDF. you should look at the PDF spec and the meaning of transfer encoding. Also, the way to determine the answer is binary diff the two PDF files, before and after going through the mail. You will probably be able to get some idea of what happened. Of course, if you really got zero length file may not be too informative. I should probably know what happened., but well I try to only send ASCII in mail. Regards, Jiangang Song Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. See how. _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] text searching + opening a document directly to the search result location
I'm just trying to get some clarity on what each of these features is, both what itext and pdf viwers support and what you are trying to do. Date: Thu, 20 May 2010 03:35:06 -0700 From: victor_ba...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] text searching + opening a document directly to the search result location Ok, we've settled the part with the document opening (PdfDestination - LocalDestination - SetOpenAction). You are asking below about search, you mean you want the action on oppen to be a search or you want it to vary depending on a parameter passed when the viewer was invoked? Do you actually want a table of contents or to do a search? Now what about the search? I need to find a certain text and then get the coordinates for it so that I cat set the PdfDestination. Is this your own app or a web app running in a browser that opens up whatever pdf viewr or plugin the user may have? Is your question how do I use itext to search a pdf document? This has come up before in the context of how do I extract text along with location on screen and often the response is it is too complicated since it involves a transofmration matrix but I did manage to get a simple utility to output lines of form x y text for all text in a document. If youy want the viewer to do the search, isn't this a viewer question ( how do I open a viewer to scroll to the first occurence of foo?). I've adopted this approach because the PDF help-file is provided by the customer. He'll be reluctant too make a separate help file for each property. And the help itself is to complex to simply transcribe it myself to HTML. Well, typing apropos pdf I did find on my debian install there is something called pdf2html, not sure how well it works but it may be an option. It depends on what you mean by complex- intricate artwork of somekind of copmlex interaction logic? It isn't hard to find lots of big customers who say gee pdf is a standard and looks good and the files are huge so there must be loads of information in there, you'd be crazy not to use this. I'm in the research stage right now, so time is precious. If iText doesn't support these features please tell me so that I may look for another solution elsewhere. -- Message: 1 Date: Wed, 19 May 2010 09:00:44 -0500 From: Cameron Laird Subject: Re: [iText-questions] text searching + opening a document directly to the search result location To: Post all your questions about iText here Message-ID: Content-Type: text/plain; charset=iso-8859-1 On Tue, May 18, 2010 at 6:50 AM, Mike Marchywka wrote: Date: Tue, 18 May 2010 04:30:48 -0700 From: victor_ba...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] text searching + opening a document directly to the search result location Hello! In my app. I have a table. On each row there is a help button. The help is provided in the form of a large PDF file. If the user presses the help button on a row, the PDF should open directly where the explanation for that row properties is. Can I do this with iText (actually iTextSharp)? Can I search the document (using the property name on that row) and then open the document to the user, directly at that location? This is a bit like the prior question about , how do I use a servlet to deal with big pdf files. The first answer might be, why are you using PDF in this setting? Rather than having a help button provoke a search, wouldn't you be better off doing the search previously or can help point to arbitrary locations? In the former case, html with fragments may work or just having separate pages, in the latter case a DB may be more appropriate although I guess you could ask about PDF indexing or TOC capabilities. Leonard, you care to explain how PDF is a good choice here? Thanks. My interest of course is that I end up having to use some of these creations that people design... Victor, it occurs to me that http://www.itworld.com/development/107909/tools-pdf-internal-links might bear on your target. -- next part -- An HTML attachment was scrubbed... _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book
Re: [iText-questions] text searching + opening a document directly to the search result location
Date: Thu, 20 May 2010 05:27:32 -0700 From: victor_ba...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] text searching + opening a document directly to the search result location Hello again! Ok, I'll try to make it a bit clearer. Imagine this table: Property Name | Blah 1 | Blah n | HELP_BUTTON property 1 | | | btnHelp propery 2 | | |btnHelp Is this in a browser or your own app? It the user presses the btnHelp, the Pdf help-file should appear on the screen. This document has the explanations for all the properties. For example: 1. Property 1 blahblah and more blah. 2. Property 2 blh (tables, diagrams...) are the properties and help responses relatively static and known apriori or quite dyanmic? Because the help is amassed in one file, the client would want this behavior: -user clicks on property n btnHelp -pdf document opens scrolled directly to n. Property n. - much like the bookmark behavior Well, everyone gets requests like this you may want to get a better idea of what the actual end product should be from the user's view point. If the concern is general quality of the help text or there is some unique facility provided by pdf it may make sense but for users who want information for an immediate need they may not need lots of pictures. You probably don't want a tutorial as you are getting ready to submit a form to trade securities, launch a missile, or land a plane. I found the SetOpenAction method. It executes when the document opens. But it needs some parameters. It needs a LocalDestination. And here i thought that the text searching would come in. I was thinking that: - get Property Name from the table - search n. property n string in the PDF - get the coordinates, location of the string occurrence presumably the search is being done by the viewer or do you want itext to do this? - set the Open Action would to the trick. But can I do that? Any other suggestions or solutions would be welcomed. Get a better idea of what the user is supposed to experience and write scripts to parse the source pdf into a suitable format unless pdf is the right choice. Thank you for your time patience. It is easier now than to find out later you have designed a system that I need to use to do something simple- many agencies have been sold on pdf and probably try to do stuff like this all the time. There are some very good pdf products generated each week by govt agencies that have pictorical information, but then there are many submissions from people forced to make public declarations that use pdf as a big way to hide data from automated processing etc etc etc. _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] iText Read Chuncks of PDF into java
Date: Wed, 19 May 2010 08:52:00 +0200 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText Read Chuncks of PDF into java crimeunit wrote: Dear all, Does somebody else know maybe that I can use another library where I can specially read out the links of content (to another pdf file) into a pdf? Reading out the links is a completely different question. Links (anchors, hyperlinks, external go to actions,...) are not part of the page content stream; they are stored in Link annotations and very easy to retrieve. We just discusses this specific issue in another thread. The question ultimately became, from my perspective, do you need to write a custom piece of code to get links or can stand on the shoulder of giants, avoid reinventing the wheel and solve the problem with cliches and command line tools such as cat xxx.pdf | grep http or better cat xxx.pdf | convert_to_form_suited_for_manipulation | grep $unambiguous_link_thingy all_links Your problem is that you are not using the correct terminology, therefore it is impossible for anybody to answer your question. This of course is a very common problem when just starting out and it makes it hard to do key word searches. A lot of your time is spent here but this is hardly unique to itext. A command line tool to dump a pdf in human readable form (LOL) with the right jargon could make this easier ( I dumped the pdf and all the wazoodalle dictionary entries were blank) This is why I usually talk around ill-posed questions time and interest permitting. I interpreted your question as a request to do something that is impossible: you want to extract structure from a PDF that isn't structured (a PDF that isn't tagged). You won't find any tool that can do that. If you can convert the PDF to text or pixels or anyother thing that may capture structure according rto some external pattern you may be able to use existing text tools or, if this is worth enough effort, OCR tools on pixels. My recurring complaint is the FDA does or has in the past accepted scanned PDF files for documentation of clinical trial results of approved drugs( look for example at dr...@fda various doc packages) . This makes it impossible for automated usage of this voluminous data and I tried OCR but it didn't work too well. Many people who file govt documents don't like automated data processing which does make this format a good choice. Calling this Accessdata is almost comical perhaps accesspictures LOL. http://www.accessdata.fda.gov/scripts/cder/drugsatfda/ http://www.accessdata.fda.gov/drugsatfda_docs/nda/2004/125104s000_Natalizumab_Pharmr_P1.pdf I did note the labels seem to be selectable and preusmably you could get data out of the cave drawings. _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] text searching + opening a document directly to the search result location
Date: Tue, 18 May 2010 04:30:48 -0700 From: victor_ba...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] text searching + opening a document directly to the search result location Hello! In my app. I have a table. On each row there is a help button. The help is provided in the form of a large PDF file. If the user presses the help button on a row, the PDF should open directly where the explanation for that row properties is. Can I do this with iText (actually iTextSharp)? Can I search the document (using the property name on that row) and then open the document to the user, directly at that location? This is a bit like the prior question about , how do I use a servlet to deal with big pdf files. The first answer might be, why are you using PDF in this setting? Rather than having a help button provoke a search, wouldn't you be better off doing the search previously or can help point to arbitrary locations? In the former case, html with fragments may work or just having separate pages, in the latter case a DB may be more appropriate although I guess you could ask about PDF indexing or TOC capabilities. Leonard, you care to explain how PDF is a good choice here? Thanks. My interest of course is that I end up having to use some of these creations that people design... Thanks, Victor _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] iText causing thread stuck
Date: Mon, 17 May 2010 09:02:24 -0700 From: msto...@autonomy.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText causing thread stuck I googled stuck executethread, and the discissions I turned up were about a fixed max thread run time that was configurable (and seems to default to 600 seconds, 10 minutes). Your self-tuning thing may be ignoring that in favor of something Fancy. I suspect you need to find a way to tell your server that this thread is going to take a long time, and that's okay... A WebLogic question, not an iText one. Now if you want to figure out how to speed up iText, that belongs here... But most of the efficiency improvements available are in the IO, while your thread is being halted in what looks like code involved in building your PdfTable in memory, not writing it out. You might throw in some logic to take any table of more than X rows and generate them in separate documents in series, such that you can get each part done reasonably quickly. Once all the portions of the table have been generated, you can stitch the PDFs back together, AT THE PAGE LEVEL. It is, for all practical purposes, impossible to extract rows and append them to existing documents. Pages or bust. --Mark Storer Senior Software Engineer Cardiff.com import legalese.Disclaimer; Disclaimer DisCard = null; -Original Message- From: stitches [mailto:sarifi...@sbcglobal.net] Sent: Friday, May 14, 2010 12:21 PM To: itext-questions@lists.sourceforge.net Subject: [iText-questions] iText causing thread stuck Hi - I'm relatively new to Java and iText. I took over a project for someone else, we're running it in Weblogic 10.3 We have a dynamic report (dynamic in the sense that the columns and the order of columns are chosen by the end user, and the different columns can have different rowspans), which we would like to export to PDF. If the dataset is relatively small, we have no problems. But when the table is extremely large, the export hangs in a thread stuck. I really hope you can help me out, this is quite urgent. Below is the error we are receiving. Thanks in advance. Thread-36 [STUCK] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' priority=1, DAEMON { com.lowagie.text.pdf.PdfLine.getChunk(Unknown Source) com.lowagie.text.pdf.PdfCell.firstLineRealHeight(Unknown Source) com.lowagie.text.pdf.PdfCell.setBottom(Unknown Source) com.lowagie.text.pdf.PdfDocument.addPdfTable(Unknown Source) com.lowagie.text.pdf.PdfDocument.add(Unknown Source) com.lowagie.text.Document.add(Unknown Source) com.novartis.dra.tap.servlets.CustomPDFGenerator.doPost(Custom PDFGenerator.java:64) com.novartis.dra.tap.servlets.CustomPDFGenerator.doGet(CustomP DFGenerator.java:59) javax.servlet.http.HttpServlet.service(HttpServlet.java:700) javax.servlet.http.HttpServlet.service(HttpServlet.java:815) weblogic.servlet.internal.StubSecurityHelper$ServletServiceAct ion.run(StubSecurityHelper.java:224) weblogic.servlet.internal.StubSecurityHelper.invokeServlet(Stu bSecurityHelper.java:108) weblogic.servlet.internal.ServletStubImpl.execute(ServletStubI mpl.java:198) weblogic.servlet.internal.ServletStubImpl.execute(ServletStubI mpl.java:175) weblogic.servlet.internal.WebAppServletContext$ServletInvocati onAction.run(WebAppServletContext.java:3468) weblogic.security.acl.internal.AuthenticatedSubject.doAs(Authe nticatedSubject.java:308) weblogic.security.service.SecurityManager.runAs(Unknown Source) weblogic.servlet.internal.WebAppServletContext.securedExecute( WebAppServletContext.java:2116) weblogic.servlet.internal.WebAppServletContext.execute(WebAppS ervletContext.java:2038) weblogic.servlet.internal.ServletRequestImpl.run(ServletReques tImpl.java:1372) weblogic.work.ExecuteThread.execute(ExecuteThread.java:198) weblogic.work.ExecuteThread.run(ExecuteThread.java:165) } -- See some of the servlet lists. Servlets are meant to be diminuitive (lets) things that run on a server to handle short tractable requests while a remote connection stays open. Even if you could keep the thread alive you may not maintain the connection- what if request comes in over a wireless link or two? You need to change the paradigm. _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before
Re: [iText-questions] Bolded text is fuzzy in PDFs
Just to emphasize the lack of pdf tools, and goad anyone who can into showing me I'm wrong, let me illustrate how you could find some suspects on your own with well know tools. Imagine what you could do if you could convert a pdf to a canonical or intermediate form that played nice with decades or prior work LOL. ( these may get distorted by hotmail. hotmail of course things everything is html... ) Your earlier comments about truetype are plausible with a quick scan of the two docs for overt font refs, 560 wget -O B -S -v http://www.windwardreports.com/temp/primf.pdf; 561 wget -O G -S -v http://www.windwardreports.com/temp/primf2.pdf; 562 strings G gs 563 strings B Bs 564 mv gs Gs 565 more Gs 566 more Bs 567 more Bs | grep -i font marchywka:/home/marchywka/junk# more Gs | grep -i font | sed -e 's/[^a-ZA-Z ]//g' | sed -e 's/ */ /g' obj Contents R MediaBox Parent R Resources Font F R F R F R ProcSet PDF Text ImageB ImageC ImageI XObject Type Page endobj obj Contents R MediaBox Parent R Resources Font F R F R F R ProcSet PDF Text ImageB ImageC ImageI XObject Type Page endobj obj Contents R MediaBox Parent R Resources Font F R F R F R F R ProcSet PDF Text ImageB ImageC ImageI XObject Type Page endobj obj Annots R R Contents R MediaBox Parent R Resources Font F R F R F R ProcSet PDF Text ImageB ImageC ImageI XObject Type Page endobj obj BaseFont DIMJAQCalibri Encoding WinAnsiEncoding FirstChar FontDescriptor R LastChar Subtype TrueType Type Font Widths endobj obj BaseFont DKIIHVArialBold Encoding WinAnsiEncoding FirstChar FontDescriptor R LastChar Subtype TrueType Type Font Widths endobj obj BaseFont DWWVFAArial Encoding WinAnsiEncoding FirstChar FontDescriptor R LastChar Subtype TrueType Type Font Widths endobj obj BaseFont DOFDBOWingdings FirstChar FontDescriptor R LastChar Subtype TrueType Type Font Widths endobj obj Ascent CapHeight Descent Flags FontBBox FontFile R FontName DIMJAQCalibri ItalicAngle StemV Type FontDescriptor endobj obj Ascent CapHeight Descent Flags FontBBox FontFile R FontName DKIIHVArialBold ItalicAngle StemV Type FontDescriptor endobj obj Ascent CapHeight Descent Flags FontBBox FontFile R FontName DWWVFAArial ItalicAngle StemV Type FontDescriptor endobj obj Ascent CapHeight Descent Flags FontBBox FontFile R FontName DOFDBOWingdings ItalicAngle StemV Type FontDescriptor endobj marchywka:/home/marchywka/junk# more Bs | grep -i font | sed -e 's/[^a-ZA-Z ]//g' | sed -e 's/ */ /g' Parent RContents RTypePageResourcesXObjectimg Rimg RProcSet PDF Text ImageB ImageC ImageIFontF RF RF RMediaBox Parent RContents RTypePageResourcesXObjectimg Rimg RProcSet PDF Text ImageB ImageC ImageIFontF RF RMediaBox Parent RContents RTypePageResourcesXObjectimg Rimg RProcSet PDF Text ImageB ImageC ImageIFontF RF RF RF RMediaBox Parent RContents RTypePageResourcesXObjectimg Rimg RProcSet PDF Text ImageB ImageC ImageIFontF RF RF RMediaBox Annots R FontBBox CapHeight TypeFontDescriptorFontFile RStemV Descent Flags FontNameOZABETArialBoldMTAscent ItalicAngle BaseFontOZABETArialBoldMTCIDSystemInfoOrderingIdentityRegistryAdobeSupplement W TypeFontSubtypeCIDFontTypeFontDescriptor RDW CIDToGIDMapIdentity DescendantFonts RBaseFontOZABETArialBoldMTTypeFontEncodingIdentityHSubtypeTypeToUnicode R FontBBox CapHeight TypeFontDescriptorFontFile RStemV Descent Flags FontNameTOCCAPArialMTAscent ItalicAngle BaseFontTOCCAPArialMTCIDSystemInfoOrderingIdentityRegistryAdobeSupplement W TypeFontSubtypeCIDFontTypeFontDescriptor RDW CIDToGIDMapIdentity DescendantFonts RBaseFontTOCCAPArialMTTypeFontEncodingIdentityHSubtypeTypeToUnicode R BaseFontHelveticaTypeFontEncodingWinAnsiEncodingSubtypeType BaseFontZapfDingbatsTypeFontSubtypeType marchywka:/home/marchywka/junk# I've also got utilitities for building vocaulary lists and diffing them etc iif you want to find words in one list that are missing in the other for example. Mike Marchywka 1975 Village Round Marietta GA 30064 415-264-8477 (w)- use this 404-788-1216 (C)- leave message 989-348-4796 (P)- emergency only marchy...@hotmail.com Note: If I am asking for free stuff, I normally use for hobby/non-profit information but may use in investment forums, public and private. Please indicate any concerns if applicable. From: ad...@windward.net To: itext-questions@lists.sourceforge.net Date: Wed, 12 May 2010 10:40:42 -0600 Subject: Re: [iText-questions] Bolded text is fuzzy in PDFs Hi Leonard, Here are links to both the pdf's. Bad - http://www.windwardreports.com/temp/primf.pdf Good - http://www.windwardreports.com/temp/primf2.pdf thanks -Original Message- From: itext-questions-requ...@lists.sourceforge.net [mailto:itext-questions-requ...@lists.sourceforge.net] Sent: Wednesday, May 12, 2010 12:23 AM To: itext-questions@lists.sourceforge.net Subject: iText-questions Digest, Vol 48, Issue 33 Send iText-questions mailing list
Re: [iText-questions] how to detect remote links in a PDF ?
From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Mon, 10 May 2010 19:01:15 -0700 Subject: Re: [iText-questions] how to detect remote links in a PDF ? There is no such thing as canonical PDF - anything that complies with the PDF specification is valid. That allows for various uses ofcompression, ASCII encoding, etc. Well, not really. If there are rules for the PDF standard then you could in fact create some alternative representation- it could be super big, verbose, complicated, etc but it may be a useful intermediate form for various types of work such as debug or adhoc editing where you don't want to waste time writing custom code to do something simple. No argument! BUT an intermediate format (or an alternative format) and a canonical format are VERY VERY different things... Well, at least canonical would be something like pdf that doesn't do anything fancy and has rules for otherwise arbitary choices then you could do simple things like ASCII searches and maybe binary diffs to test for pixel equality etc. There are many folks who have developed alternative representations of PDF, whether in XML or other formats, including Adobe ourselves. For example, Adobe has a project codenamed Mars on our Labs site () which describes an XML+ZIP-based representation of PDF. It supports all of the features of PDF from PDF 1.7. We provide some tooling for Acrobat Reader, and you are welcome to develop your own. But again, that's NOT canonical - just alternative. But that would work for the original purpose too. Maybe you should mention these on itext somewhere and refer people to them. It is hard to say you wil be accused of being biases any more than you already are and if the tools work who cares if you are biased? LOL . From your terse descriptions, that even sounds like a sane and workable approach, not what I would have expected ( sorry, had to interject LOL). This is also not irrelevant to itext implementation as a prior thread was talking about optimizations at an algorithm level If you had some attributes of a parsed or intermediate form that make various manipulations easy, it may be a good thing for itext to parse into or even write out for other canned ( itext based or not ) tools to use. cat pdf | itext_parse_to_intmediate_form | my_itext_tool | intermediate_to_pdf -O3 new.pdf Piping can be slow but obviously you can start mashing tools together etc. That's why library such as iText exist - to provide you with higher level APIs (where possible). They are what one would use to create automated test tools, validators, etc. And many such tools already do exist - so it's definitely doable (and has been done). If you took that attitude you couldn't even hide behind but pdf is a standard since then the argument is well I have API xyz and we can do anything with it. if you use my ABC format I guess having a list would help, is there a pdf developer download somewhere with tools like this? Adobe Acrobat Professional includes a PDF validator feature as part of its Preflight module, and has since version 7. It is the only publicly available validator that I am aware of, though I have spoken to at least a half-dozen commercial PDF vendors that have told me that they have developed their own validators for their own use. There used to be two limited open source validators - JHOVE () and Multivalent (). But to my knowledge, neither is currently supported/updated. Since both were Java-based OSS, I would think you could pick them up and run with them if you wished. Ok, sounds like reasonable starting points. I'm not saying it is trivial to do any of this, but it does seem much of the traffic here never gets referred to any simple diagnostics. Leonard -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask
Re: [iText-questions] What action is requred in terms of License
Date: Tue, 11 May 2010 15:49:47 +0200 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] What action is requred in terms of License Michael Olenick wrote: People -- Call or write to Bruno's sales agent. Thanks! I've had a speed course in economics at http://www.vlerick.com/ last year, discovering there's more to the IT business than writing code. It's all about the business model, not just the business model used for iText, but also the business model of the end user/developer. I think many of the issues we come up against are actually due to thinking about business before technological issues. I won't mention any names, but if you know of any large publically traded companies that derive significant revenue due to PDF they then this may or may not relate to those entities. Walled gardens have most recently been tried by cell phone companies and if you search SEC filings for such terms, there as of late has been a recognition that they are bad for business- users and developers get mad. So, to the extent business comes before making useful products, something to consider. We all want to make money but creating artificial barriers and designing products that lock people into fixed vendors or ways of thinking is rarely helpful. _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] how to detect remote links in a PDF ?
Date: Sun, 9 May 2010 23:08:51 +0200 From: papa...@googlemail.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] how to detect remote links in a PDF ? Colleagues, For an application, one needs to detect the hyperlinks (i.e. done with Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone point me to a solution ? Question for leonard or others who have read the spec, if you literally ONLY want to list the links, not parse the document or determine any context, are they likely to be hidden or can you just use text tools to find strings that start or contain http ? For example, 540 cat *.pdf ../Desktop/*.pdf | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep http 541 cat *.pdf ../Desktop/*.pdf | strings | grep http 542 history These seem to work in that they find things with http but not sure what would be missing. Many of these seem to be surrounded by xml or prefixed with /A but not sure what other contexts may exist. Thanks. Thank you very much in advance, Pieter Vankeerberghen -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] how to detect remote links in a PDF ?
From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Mon, 10 May 2010 06:44:13 -0700 Subject: Re: [iText-questions] how to detect remote links in a PDF ? Prior to PDF 1.5, you could have done a grep (or equivalent) since only stream objects were compressed. However, as of PDF 1.5, we now have object streams, where groups of objects are placed into a stream and then compressed - which means that grep will no longer work. Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, such as PDF/A) use object stream compression to keep file sizes down. I've been trying to recommend that other products do the same. Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in it to some Standard or canonical format that can let it be used with other tools so you don't have to write custom code for every little trivail variation of a thing you wish to accopmlish? For example, cat xxx.pdf | pdf_to_standard_form | grep http Obivously applicability would go beyond the immediate question but also let people writing itext code have some way to check their results more easily than it opened in proprietary adobe product X but in black box Y it greyed out 3 menu options and wouldn't let me save it unless blah blah bla ? There is nothing wrong with a human readable end product but given the complexity of these things it would be nice to use computers to automate certain things, like checking for links or other attributes. Without ability to use automated tools everything comes down to a long menu chain and terse messages from products not designed for debug. So while there certainly exists lots of PDFs that you could grep, the numbers are reducing daily... Leonard -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Monday, May 10, 2010 3:51 AM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] how to detect remote links in a PDF ? Date: Sun, 9 May 2010 23:08:51 +0200 From: papa...@googlemail.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] how to detect remote links in a PDF ? Colleagues, For an application, one needs to detect the hyperlinks (i.e. done with Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone point me to a solution ? Question for leonard or others who have read the spec, if you literally ONLY want to list the links, not parse the document or determine any context, are they likely to be hidden or can you just use text tools to find strings that start or contain http ? For example, 540 cat *.pdf ../Desktop/*.pdf | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep http 541 cat *.pdf ../Desktop/*.pdf | strings | grep http 542 history These seem to work in that they find things with http but not sure what would be missing. Many of these seem to be surrounded by xml or prefixed with /A but not sure what other contexts may exist. Thanks. Thank you very much in advance, Pieter Vankeerberghen -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords
Re: [iText-questions] how to detect remote links in a PDF ?
From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Mon, 10 May 2010 18:09:15 -0700 Subject: Re: [iText-questions] how to detect remote links in a PDF ? There is no such thing as canonical PDF - anything that complies with the PDF specification is valid. That allows for various uses of compression, ASCII encoding, etc. There are certainly tools out there that will uncompress/defilter all the elements in the PDF so that it is plain text and can be searched using text-only tools - though certainly that wouldn't help you for modifications (for obvious reasons). Well, not really. If there are rules for the PDF standard then you could in fact create some alternative representation- it could be super big, verbose, complicated, etc but it may be a useful intermediate form for various types of work such as debug or adhoc editing where you don't want to waste time writing custom code to do something simple. XXX Intermediate Form is a very common file format :) I guess you could imagine expanding it to some XML format where you have decompressed the text and done something with the images, fonts, and formatting information- no idea what. Essentially your claim is that PDF is so bizarre, unique, superlative, and singular, nothing can possibly equal it :) I just downloaded some schematic capture programs and those create documents that are inherently graphical- schematics- but the essential features can be easily extracted as concise text netlists. That's why library such as iText exist - to provide you with higher level APIs (where possible). They are what one would use to create automated test tools, validators, etc. And many such tools already do exist - so it's definitely doable (and has been done). If you took that attitude you couldn't even hide behind but pdf is a standard since then the argument is well I have API xyz and we can do anything with it. if you use my ABC format I guess having a list would help, is there a pdf developer download somewhere with tools like this? This reminds me of when I first got here and you explained logical structure was available but everytimei it comes up in a concrete rather than hypothetical case everyone says, Sure you could preserve strcuture but it is too copmlicated to be practical. In the present case, you say the tools exist but when someone shows up with an error from acrobat no one can point to a tool to check the pdf. And let us not forget the expression - just because you only have a hammer, doesn't mean everything is a nail! That's fine if you have a list of tools somewhere but I keep seeing the same hammer being used, usually an Acrobate reader with the informative diagnostics your pdf is damaged. Again, I'm not saying this is a fault with ADBE or pdf, but it would be nice to refer people to some list of tools that give a better diagnostic. In many cases of course all you really care about is the text and the hammer gets almost everything done. When you need the graphics that is a different situation. So ok I've only got one swiss army knife LOL. Leonard -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Monday, May 10, 2010 6:02 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] how to detect remote links in a PDF ? From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Mon, 10 May 2010 06:44:13 -0700 Subject: Re: [iText-questions] how to detect remote links in a PDF ? Prior to PDF 1.5, you could have done a grep (or equivalent) since only stream objects were compressed. However, as of PDF 1.5, we now have object streams, where groups of objects are placed into a stream and then compressed - which means that grep will no longer work. Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, such as PDF/A) use object stream compression to keep file sizes down. I've been trying to recommend that other products do the same. Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in it to some Standard or canonical format that can let it be used with other tools so you don't have to write custom code for every little trivail variation of a thing you wish to accopmlish? For example, cat xxx.pdf | pdf_to_standard_form | grep http Obivously applicability would go beyond the immediate question but also let people writing itext code have some way to check their results more easily than it opened in proprietary adobe product X but in black box Y it greyed out 3 menu options and wouldn't let me save it unless blah blah bla ? There is nothing wrong with a human readable end product but given the complexity of these things it would be nice to use computers to automate certain things, like checking for links or other
Re: [iText-questions] Guidance Requested - Generating multipage output with header/footer and pg 1 layout
Where have you learned this? Who is spreading this kind of desinformation? Why are you saying this. In attachment, you can find a very simple example that proves the opposite of what you've learned. And more importantly: what is wrong with the documentation??? What is wrong with the second edition of the book that still causes misconceptions like this? There are still a couple of months left to find a remedy before the book goes to print. Along the lines of your comments, I would guess the problem would be someone has to read it. Now, before you dismiss this as a pointless joke, I would just consider it as Stating the obvious as this often is a route to great discovery ( you needn't pat me on the back at this point, I am doing so now). Stating another obvious, we have computers. With appropriately formetted electronic publications, we have tools like grep to help us find what we need with efficient use of resources. However, often there is a problem with vocabulary and document structure for the beginner wishing to become familiar with a topic. Keywords don't help if you don't know what they are, if there is not document structure common word context is hard to find. So, you can create neologisms to make searching easier or in the electronic docs create structure. fwiw. While Leonard and others keep pointing out that PDF has structural capabilities, everytime someone asks here a question that lends itself to use of these facilities, almost unanimuous opinion is, sure its possible but it is too complicated or hard and no one would use it, I can grep javadocs and have some idea of context since the rendered html is fairly uniform if not intended to be structured. I can build my own indexes and remove common words, reducing the time to learn jargon etc. _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Performance when flattening form fields
Date: Sun, 25 Apr 2010 22:14:02 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Performance when flattening form fields After more digging, I'm wondering if the place to do this wouldn't be in the PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do the same flattening operation that PdfStamper does. The ideal would be to factor out the behavior so the code isn't duplicated in both PdfCopy and PdfStamper... I guess I have the larger question of exactly what parsing is? That is, it seem generally you use itext to 1) read in somthing, often an existing pdf, 2) do some stuff, then 3) write out a pdf. Presumably as you go through step 2, you are assembling or compiling a bunch of structures that allow you to do step 3 but are more optimized for manipulation and editing the nascent PDF. If I understand your earlier comments, you apparently don't actually have a generic PDF parser to do step 1 that works with all sequences you could put into step 2. Now, of course, more generally the above approach doesn't scale as you would always hope to stream to some extent- read what you need, write what you can etc. However, that could probably be hidden somewhat into the implementation for classes for each step. So, instead of things like PdfCoolFeature.doSomething(byte[] pdffile) you have PdfCoolFeadture.doSomething( ParsedPdfOperand pdflikething) where the second signature take a parameter that is generally optimized for a broad class of common operations. Does anyone see any technical issues with this as a strategy? - K 'Kevin Day' wrote: I've been doing some digging into the performance question that Giovanni Azua has posted about. Some of his findings (using StringBuilder, etc...) are solid improvements to overall iText performance - however, the crux of the performance difference he is seeing between iText and the competing solution is not low level. It's a high level issue. Here's what's going on: His specific use case involves stamping headers and footers onto pages. The footer contains AcroFields that must be flattened prior to stamping. The performance hit is coming from the fact that, in order to flatten and apply the footer, he is having to: 1. Construct a PDF using PdfStamper 2. Write output to a byte array output stream 3. Re-parse the BAOS into a PdfReader 4. Import the page from the reader for use as a stamp While this is functional, it is certainly not performant. A much, much faster technique would be to do the flattening to the *reader*, then just import the page to the output writer. This avoids the awkward creation of the temporary PdfReader. So, the performance delta is not caused so much by iText's low level implementation (although the performance improvements that Giovanni has suggested will help to make iText even faster than it already is) - the delta is really caused by an awkward operation forced on the user by the framework. So, are there any fundamental reasons to not do flattening, etc... to the PdfReader? My first look at the code indicates that it may be possible to factor this out of PdfStamper (basically, instead of adjusting the AcroFields dictionary and content streams in the PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the PdfReader). I'm thinking of something along the lines of: PdfFormFlattener(PdfReader).flatten(pageNumber) Maybe with supplemental methods for flattenNamedFields(pageNumber), flattenFieldsOfType(pageNumber) Thoughts? - K -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- View this message in context: http://old.nabble.com/Performance-when-flattening-form-fields-tp28357673p28360908.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy is not the
Re: [iText-questions] Performance when flattening form fields
Date: Mon, 26 Apr 2010 08:14:32 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Performance when flattening form fields Mike - can we please reserve this thread for a technical discussion of the merits of the proposal? I'd be happy to have a conversation in a separate thread regarding how iText works. The merits of the proposal seem to relate to how itext works no? That is, you are talking about problems reducing some thing to a pdf file solely to read it back in and reparse it. If you didn't have to write out a pdf file, if you could pass around the internal thing you seem to save some time. Isn't that what you are proposing to attack? - K Mike Marchywka-2 wrote: Date: Sun, 25 Apr 2010 22:14:02 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Performance when flattening form fields After more digging, I'm wondering if the place to do this wouldn't be in the PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do the same flattening operation that PdfStamper does. The ideal would be to factor out the behavior so the code isn't duplicated in both PdfCopy and PdfStamper... I guess I have the larger question of exactly what parsing is? That is, it seem generally you use itext to 1) read in somthing, often an existing pdf, 2) do some stuff, then 3) write out a pdf. Presumably as you go through step 2, you are assembling or compiling a bunch of structures that allow you to do step 3 but are more optimized for manipulation and editing the nascent PDF. If I understand your earlier comments, you apparently don't actually have a generic PDF parser to do step 1 that works with all sequences you could put into step 2. Now, of course, more generally the above approach doesn't scale as you would always hope to stream to some extent- read what you need, write what you can etc. However, that could probably be hidden somewhat into the implementation for classes for each step. So, instead of things like PdfCoolFeature.doSomething(byte[] pdffile) you have PdfCoolFeadture.doSomething( ParsedPdfOperand pdflikething) where the second signature take a parameter that is generally optimized for a broad class of common operations. Does anyone see any technical issues with this as a strategy? - K 'Kevin Day' wrote: I've been doing some digging into the performance question that Giovanni Azua has posted about. Some of his findings (using StringBuilder, etc...) are solid improvements to overall iText performance - however, the crux of the performance difference he is seeing between iText and the competing solution is not low level. It's a high level issue. Here's what's going on: His specific use case involves stamping headers and footers onto pages. The footer contains AcroFields that must be flattened prior to stamping. The performance hit is coming from the fact that, in order to flatten and apply the footer, he is having to: 1. Construct a PDF using PdfStamper 2. Write output to a byte array output stream 3. Re-parse the BAOS into a PdfReader 4. Import the page from the reader for use as a stamp While this is functional, it is certainly not performant. A much, much faster technique would be to do the flattening to the *reader*, then just import the page to the output writer. This avoids the awkward creation of the temporary PdfReader. So, the performance delta is not caused so much by iText's low level implementation (although the performance improvements that Giovanni has suggested will help to make iText even faster than it already is) - the delta is really caused by an awkward operation forced on the user by the framework. So, are there any fundamental reasons to not do flattening, etc... to the PdfReader? My first look at the code indicates that it may be possible to factor this out of PdfStamper (basically, instead of adjusting the AcroFields dictionary and content streams in the PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the PdfReader). I'm thinking of something along the lines of: PdfFormFlattener(PdfReader).flatten(pageNumber) Maybe with supplemental methods for flattenNamedFields(pageNumber), flattenFieldsOfType(pageNumber) Thoughts? - K -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- View
Re: [iText-questions] Performance when flattening form fields
Date: Sun, 25 Apr 2010 10:58:06 -0700 From: ke...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Performance when flattening form fields I've been doing some digging into the performance question that Giovanni Azua has posted about. Some of his findings (using StringBuilder, etc...) are solid improvements to overall iText performance - however, the crux of the performance difference he is seeing between iText and the competing solution is not low level. It's a high level issue. Here's what's going on: His specific use case involves stamping headers and footers onto pages. The footer contains AcroFields that must be flattened prior to stamping. The performance hit is coming from the fact that, in order to flatten and apply the footer, he is having to: 1. Construct a PDF using PdfStamper 2. Write output to a byte array output stream 3. Re-parse the BAOS into a PdfReader 4. Import the page from the reader for use as a stamp While this is functional, it is certainly not performant. A much, much faster technique would be to do the flattening to the *reader*, then just import the page to the output writer. This avoids the awkward creation of the temporary PdfReader. So, there is no internal representation of a pdf doc you can pass around without converting to a file format? If I understand you, you are saying that he is forced to convert a bunch of structures into a pdf file just so he can re-parse this file back into an internal set of structures for further work? How do you know the other package doesn't have to do this? Is this only an issue with flattenning or is that just the specific problem here but other operations may encounter simmilar problems? So, the performance delta is not caused so much by iText's low level implementation (although the performance improvements that Giovanni has suggested will help to make iText even faster than it already is) - the delta is really caused by an awkward operation forced on the user by the framework. So, are there any fundamental reasons to not do flattening, etc... to the PdfReader? My first look at the code indicates that it may be possible to factor this out of PdfStamper (basically, instead of adjusting the AcroFields dictionary and content streams in the PdfStamper/PdfCopy/etc... output, we'd make those adjustments to the PdfReader). I'm thinking of something along the lines of: PdfFormFlattener(PdfReader).flatten(pageNumber) Maybe with supplemental methods for flattenNamedFields(pageNumber), flattenFieldsOfType(pageNumber) Thoughts? - K _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
From: brave...@gmail.com Date: Sat, 24 Apr 2010 13:05:26 +0200 To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up Hello, On Apr 23, 2010, at 10:50 PM, trumpetinc wrote: Don't know if it'll make any difference, but the way you are reading the file is horribly inefficient. If the code you wrote is part of your test times, you might want to re-try, but using this instead (I'm just tossing this together - there might be type-os): ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[8092]; int n; while ((n = is.read(buf))= 0) { baos.write(buf, 0, n); } return baos.toByteArray(); I tried your suggestion above and made no significative difference compared to doing the loading from iText. The fastest I could get my use case to work using this pre-loading concept was by loading the whole file in one shot using the code below. If as indicated below you are generally IO limited, don't throw the code out yet. If you must copy data you want to use array based methods as often as possible but the first preference is to avoid copies unless of course you are strategicly preloading or something. I often just turn everything into a byte array but obviously this doesn't scale too well unless you are content to let VM do your swapping for you. Ideally you would just load what you need in a just-in-time fashion to avoid tying up idle RAM. Applying the cumulative patch plus preloading the whole PDF using the code below, my original test-case now performs 7.74% faster than before, roughly 22% away from competitor now ... btw the average response time numbers I was getting: - average response time of 77ms original unchanged test-case from the office multi-processor-multi-core workstation - average response time of 15ms original unchanged test-case from home using my MBP I attribute the huge difference between those two similar experiments mainly to having an SSD drive in my MBP ... the top Host spots reported from the profiler are related one way or another to IO so would be no wonder that with an SSD drive the response time improves by a factor of 5x. There are other differences though e.g. OS, JVM version. Multi-proc and disk cache can cause some confusions. I wouldn't ignore task manager for some initial investigations- if the CPU drops and disk light comes on you are likely to be disk limited. With IO it is easy to get nickel-and-dimed to death as everyone who relays the data can be low on profile chart but it adds up. Wall-clock times are least susceptible to manipulation and may be best for A-B comparisons if you have control over other stuff running on machine ( cash flow versus pro-forma earnings LOL). If you can subclass the random access file thing you may be able to first collect statistics and then write something that can see into the future a few milliseconds. All the generic caches work on past results, things like MRU except maybe the prefetch which assumes you will continue to do sequential memory accesses. If you are in a posittion to make forward looking statements that have a material impact on your performance you ( ROFL) you may be able to do much better. Best regards, Giovanni private static byte[] file2ByteArray(String filePath) throws Exception { InputStream input = null; try { File file = new File(filePath); input = new BufferedInputStream(new FileInputStream(filePath)); byte[] buff = new byte[(int) file.length()]; input.read(buff); return buff; } finally { if (input != null) { input.close(); } } } _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
Isn't there something in PDF about linearization? ( the term comes up as a suggestion on google, LOL). How can you compare the two resulting pdf's in terms of dynamic attributes or arbitrary ordering or some items- given issues with IO and access patterns this could be an issue. In fact, you could even imagine that if you could reorder somethings you get win-win for creation and future rendering time. What is the extent of the freedom here? It sounds like any hints you would generate for reader could be used during document manipulation in itext. Date: Sat, 24 Apr 2010 11:59:14 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up If the file is being entirely pre-loaded, then I doubt that IO blocking is a significant contributing factor to your test. I think that the best clue here may be the difference between performance with form flattening and without form flattening. Just to confirm, am I right in saying that iText outperforms the competitor by a significant amount in the non-flattening scenario? If that's the case, then it seems like we should see significant differences in the profiling results between the flattening and non-flattening scenarios in iText. Would you be willing to post the profiling results for both cases so we can see which code paths are consuming the most runtime in each? Another possibility if the profiling results show similar hotspots is that the form flattening algorithms in iText are using the hotspot areas a lot more than in the non-flattening case. There may be a bunch of redundant reads or something in the flattening case. Let's take a look at the profiling results and see if we can draw any conclusions about where to go next. BTW - which profiler are you using? Are you able to expand each of the hotspot code paths and see the actual call path that is causing the bottleneck? I use jvvm, and the results of expanding the hotspot call trees can be quite illuminating. What I really would like is to get ahold of your two benchmark tests (with and without flattening) so I can run it on my system - do you have anything you can package up and share? - K Giovanni Azua-2 wrote: Hello, On Apr 23, 2010, at 10:50 PM, trumpetinc wrote: Don't know if it'll make any difference, but the way you are reading the file is horribly inefficient. If the code you wrote is part of your test times, you might want to re-try, but using this instead (I'm just tossing this together - there might be type-os): ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[8092]; int n; while ((n = is.read(buf))= 0) { baos.write(buf, 0, n); } return baos.toByteArray(); I tried your suggestion above and made no significative difference compared to doing the loading from iText. The fastest I could get my use case to work using this pre-loading concept was by loading the whole file in one shot using the code below. Applying the cumulative patch plus preloading the whole PDF using the code below, my original test-case now performs 7.74% faster than before, roughly 22% away from competitor now ... btw the average response time numbers I was getting: - average response time of 77ms original unchanged test-case from the office multi-processor-multi-core workstation - average response time of 15ms original unchanged test-case from home using my MBP I attribute the huge difference between those two similar experiments mainly to having an SSD drive in my MBP ... the top Host spots reported from the profiler are related one way or another to IO so would be no wonder that with an SSD drive the response time improves by a factor of 5x. There are other differences though e.g. OS, JVM version. Best regards, Giovanni private static byte[] file2ByteArray(String filePath) throws Exception { InputStream input = null; try { File file = new File(filePath); input = new BufferedInputStream(new FileInputStream(filePath)); byte[] buff = new byte[(int) file.length()]; input.read(buff); return buff; } finally { if (input != null) { input.close(); } } } -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28352147.html Sent from the iText - General mailing list archive at Nabble.com. --
Re: [iText-questions] performance follow up
From: brave...@gmail.com Date: Fri, 23 Apr 2010 12:23:50 +0200 To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up Hello Mike, On Apr 23, 2010, at 12:55 AM, Mike Marchywka wrote: Mark Twain gets to the front so quickly. Again, I'm not suggesting you did anything wrong or bad, I haven't actually checked numbers or given the specific test a lot of thought- 9 data points is usually not all that conclusive in any case and I guess that's my point. There are 10 means, each mean comes from 1K data points, so there are 10K data points for each version tested, not just 9 I thought you had 9 test cases but 9 or 10 doesn't matter much. Unlike other tests of significance, t-test doesn't need a large number of observations. It is actually this case of few observations e.g. 10 yeah, personally I've never liked nonparametrics and other approaches that magically work with a few samples. Generally they treat the small sample cases by dealing with outliers ( outliars LOL? ) more gracefully. However, if you make some assumptions about population statistics and run monte carlo you can see how often your 9 or 10 points with your chosen test lead to misleading results. This is a bit of an inverse problem- you have data with contributions from many sources ( ie noise ) and you are trying to estimate some underlying clean number- known approaches to this have limits. means one of its main use-cases. Indeed one would need to check the assumptions of independence and normality. Looking at the response times Cache warmness could lead to lots of run-to-run dependence depending on what you measure and I know personally I see this with second invokation of various command line programs. If as you suggest the distro is normal maybe it is just a bunch of random junk but sometimes you see multi-modal and each peak is probably due to something interesting. In any case, the point here is to make the code faster and use the stats or whatever other meaures you have to point the way. Best regards, Giovanni -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
From: brave...@gmail.com Date: Fri, 23 Apr 2010 12:55:03 +0200 To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up On Apr 22, 2010, at 11:18 PM, trumpetinc wrote: I like your approach! A simple if (ch 32) return false; at the very top would give the most bang for the least effort (if you do go the bitmask route, be sure to include unit tests!). Doing this change spares approximately two seconds out of the full workload so now shows 8s instead of 10s and isWhitespace stays at 1%. The numbers below include two extra changes: the one from trumpetinc above and migrating all StringBuffer references to use instead StringBuilder. The top are now: PRTokeniser.nextToken 8% 77s 19'268'000 invocations RandomAccessFileOrArray.read 6% 53s 149'047'680 invocations MappedRandomAccessFile.read 3% 26s 61'065'680 invocations PdfReader.removeUnusedCode 1% 15s 6000 invocations PdfEncodings.convertToBytes 1% 15s 5'296'207 invocations PRTokeniser.nextValidToken 1% 12s 9'862'000 invocations PdfReader.readPRObject 1% 10s 5'974'000 invocations ByteBuffer.append(char) 1% 10s 19'379'382 invocations PRTokeniser.backOnePosition 1% 10s 17'574'000 invocations PRTokeniser.isWhitespace 1% 8s 35'622'000 invocations A bit further down there is ByteBuffer.append_i that often needs to reallocate and do an array copy thus the expensive ByBuffer.append(char) above ... I am playing right now with bigger initial sizes e.g. 512 instead of 127 ... I had a draft message I never sent regarding this. Essentially don't call append, find the end then call String(byte[],offset,length) or better ( this gets involved ) don't make temp strings just pass around indexes ( you need to give this some thought, but my post was getting quite confusing so I scrapped it). Best regards, Giovanni -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
this is draft I mentioned earlier, it was getting a bit convoluted due to over qualifiying each assertion but if you are using append's a lot, consider the basic idea of finding the delims FIRSt then doing one or more array ops or avoiding string creating altogether. I don't have any idea what you are doing with these strings you parse but if building dictionaries, consider things like the following. On large dictionaries with coherent access patterns , hash tables may not be as efficient as sorted things with the right indexing ( this may not be apparent until you start VM thrashing but if you have ordered queries on static dictionaries, a sparse hash can make a mess of a cache compared to a well thought out b-search on a compact representation of your strings). I'm not entirely sure the multi-pass approach I try to outline below has a lot of merit but you would need to consider some issues along these lines. From: brave...@gmail.com Date: Fri, 23 Apr 2010 12:27:42 +0200 To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up Hello Paulo, On Apr 22, 2010, at 11:43 PM, Paulo Soares wrote: FYI I already use a table to map the char to the result for the delimiter testing and the speed improvement was zero in relation to plain comparisons. Paulo You are right ... changing to a table makes no difference. I checked this with the profiler and the results stay the same. Why does that method take an int param vs char or better a byte? Implicit casts are not normally free, probably look up table needs to convert array index to int anyway but if you are doing specific booleans comparing byte to byte you may be able to avoid some JVM junk. In any case, the method code could hide that if needed at all. As should be clear, I'm not familiar with the code and don't have it in from of me but a few thoughts. Often reordering operations can help but it may not be obvious a priori which approach is best. Multiple passes are generally bad compared to working on blocks that preserve locality and maximize low level memory cache hits. However, due to other issues it coud make sense, or at least multiple passes in small blocks. You could consider inlining this method in one place along with any similar ones and making a classification pass during which you scan each char in your input data and create a class for it. Then make a second pass through your now huge data in which each char is followed by its class and then have processing based on a big switch statement that switches on the class and whatever state info you have made. Or, consider building a table of whitespace locations on your first pass etc etc. If you are currently going through calling something like an append(char) method on each char, you may be better off finding limits and creating a new string with String(byte[]. offset, length) etc. Also, presumably you find token limits and then make strings, it is possible to avoid creating strings at all and just pass around indexes into a byte array? This may require massive code changes all over and depending on what you do with the strings may or may not help much as many common operations may be expected to be opimitzed in native code for strings. However, If you have huge hash tables each look up may be cheap to compute but each one also trashes the memory cache. You may be better off with ordered index structs that you can implement in java with byte[] more easily than strings. And, of course, don't ignore obvious data dependent optimizations. If you have strings with long common prefix like, http://www then removing this from compares could be a big help with memory and speed. Best regards, Giovanni _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
Date: Fri, 23 Apr 2010 09:43:08 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up Yes - it needs to be int. Regardless, we need to focus on the things that So you are doing everything internally with 32 bit chars? Not a big deal but if these are mostly zero there may be better ways to represent and save memory. You may say, well RAM is cheap but that doesn't matter since low level caches are fixed but I guess you can get a bigger disk and say VM is unlimited. are actually consuming run time, and this method isn't one of them (no matter how much it could be optimized). The only person with data claimed otherwise :) Mike Marchywka-2 wrote: does this have to be int vs char or byte? I think earlier I suggested operating on byte[] instead of making a bunch of temp strings but I don't know the context well enough to know if this makes sense. Certainly demorgan can help but casts and calls are not free either. Also, maybe hotspot runtime has gotten better but I have found in the past that look up tables can quickly become competititve with bit operators ( if your param is byte instead of int, a 256 entry table can tell you if the byte is a member of which classes). -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28343789.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
Date: Fri, 23 Apr 2010 10:29:43 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up This tells us that the focus needs to be on PRTokeniser and RAFOA. For what it's worth, these have also shown up as the bottlenecks in profiling I've done in the parser package (not too surprising). I'll discuss each in turn: RandomAccessFileOrArray - I've done a lot of thinking on this one in the past. This is an interesting class that can have massive impact on performance, depending on how you use it. For example, if you load your source PDF entirely into memory, and pass the bytes into RAFOA, it will remove IO bottlenecks. I mentioned IO early on as a neglected task where you just pull some generic thing out and let it hand you a byte at a time, clearly if you know your access pattern and can make it coherent you stand to gain a lot. Caching and VM can only guess what you will do next, you may know better :) The one problem with the memory mapped strategy (in it's current implementation) is that really big source PDFs still can't be loaded into memory. This could be addressed by using a paging strategy on the mapped You can have alt implementations in the mean time if you know size a priori. Ideally you would like to be able to operate on a stream and scrap random access. will use memory mapped IO is the Document.plainRandomAccess static public variable (shudder). As if 2 people would use this at the same time ROFL :) Its bad enough you don't have globals... So what about the code paths in PRTokeniser.nextToken()? We've got a number of tight loops reading individual characters from the RAFOA. If the backing source isn't buffered, this would be a problem, but I don't know that is really the issue here (it would be worth measuring though...) The StringBuffer could definitely be replaced with a StringBuilder, and it could be re-used instead of re-allocating for each call to nextTokeen() (this would probably help quite a bit, as I'll bet the default size of the backing buffer has to keep growing during dictionary reads). Again, why even do this? Find the delimeters and, if you must make a string make it only when you know start and end, don't keep looping with append(char) no matter how nice the source code looks. If you can use anything with [] in the sig it stands a chance of being faster. Pass as much a priori info to the library classes as you can- append means whoops I found ANOTHER thing to add when you already have the data just say here is the string I need. And unless you actually need the tokens as strings, consider just returning indexes or something. You may or may not need strings all the time, it may be possible to use int[] or something. Another thing that could make a difference is ordering of the case and if's - for example, the default: branch turns around and does a check for (ch == '-' || ch == '+' || ch == '.' || (ch= '0' ch = '9'). Changing this to be: case '-': case '+': case '.': case '0': ... case '9': Actually optimizing compilers do stuff like this- you have some known, assumed, or measured branching probability and make the common ones faster ( minimize expectation value of execution time). I haven't checked lately but IIRC the compiler tries to make a switch into a jump table. May be better. The loops that check for while (ch != -1 ((ch= '0' ch = '9') || ch == '.')) could also probably be optimized by removing the ch != -1 check - the other conditions ensure that the loop will escape if ch==-1 It might be interesting to break the top level parsing branches into separate functions so the profiler tell us which of these main branches is consuming the bulk of the run time. Those are the obvious low hanging fruit that I see. Final point: I've seen some comments suggesting inlining of some code. Modern VMs are quite good at doing this sort of inlining automatically - a test would be advisable before worrying about it too much. Having things split out actually makes it easier to use a profiler to determine where the bottleneck is. One thing that is quite clear here is that we need to have some sort of benchmark that we can use for evaluation - for example, if I had a good benchmark test, I would have just tried the ideas above to see how they fared. If you have a set of benchmarks, you can afford to measure them according to things you think will impact execution time. then, with enough, you can fit execution time to your measurements using alt pieces of code. This is where you start find statistics helpful ( pdf with value X for attribute A incurs a time penalty of n seconds per increment of X ). FWIW, there is also something about identical, indepdent entities for a statistical sample. If you can measure these they aren't identical. The top are now:
Re: [iText-questions] AW ESOME! performance follow up
btw, why do these need to get changed to ints? and, do you notice with task manager that CPU is not at 100 pct? This often indicates disk limit- either explicit IO or VM. I've actually had c++ code that I thought was computationally limited turn out to be IO limited. Often simple compression is well worth the effort. Not just for explicit disk transfer rates, but for saving memory to keep more things in low level cache or out of VM. And, again, more generally prefer things with [] in sig, including string ops if you even need strings instead of []. Date: Fri, 23 Apr 2010 13:50:49 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] AW ESOME! performance follow up Don't know if it'll make any difference, but the way you are reading the file is horribly inefficient. If the code you wrote is part of your test times, you might want to re-try, but using this instead (I'm just tossing this together - there might be type-os): ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[8092]; int n; while ((n = is.read(buf))= 0) { baos.write(buf, 0, n); } return baos.toByteArray(); If loading the file into main memory makes any difference, that difference will be a measure of the impact of virtual-native interface interaction. In effect, this is telling us whether the calls to file.read() should be replaced with file.read(byte[]). From your results, are you seeing a big difference between iText and the competitor when you aren't flattening fields vs you are flattening fields? Your profiling results aren't indicating bottlenecks in that area of the code. If iText is much faster than the competitor in the non-flattening scenario, but slower than the competitor in the flattening scenario, I'm having a hard time reconciling the data presented so far. Giovanni Azua-2 wrote: I am sooo sorry the performance is worse with the change for pre-loading the PDFs in the test-case :(( the problem was that I ran the benchmarks with a small mistake in my test case ... Loading the HEADER demonstrates how to load flattened pre-formatted PDF part templates ... Loading the FOOTER demonstrates how to load PDF part templates containing fields that need to be populated. The mistake was to leave fixed the HEADER always ... so it would load only the flattened PDF template and not the footer (see below) [sigh] In any case is good to know that loading flattened PDF parts is cheaper. I mistakenly ran the last benchmark like this: private static byte[] file2ByteArray(String filePath) throws Exception { InputStream input = null; ByteArrayOutputStream output = null; try { input = new BufferedInputStream(new FileInputStream(HEADER_PATH)); output = new ByteArrayOutputStream(); int data = input.read(); while (data != -1) { output.write(data); data = input.read(); } return output.toByteArray(); } finally { if (input != null) { input.close(); } if (output != null) { output.close(); } } } -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28346146.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list:
Re: [iText-questions] performance follow up
Date: Fri, 23 Apr 2010 11:53:09 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up Parsing PDF requires a lot of random access. It tends to be chunked - move to a particular offset in the file, then parse as a stream (this is why paging makes sense, and why memory mapping is effective until the file gets Yes, that is great but instead of a generic MRU approach are there better predictions you can make, even start loaing pages before having to wait later etc? Maybe multithreading makes sense here. too big). But the parsing is incredibly complex. You can have nested object structures, lots of alternative representations for the same type of data, etc... surely there are rules and I'm sure this topic has been beaten to death in many CS courses ( as have stats LOL). Profiling should point to some suspects. Algorithmic optimizations may be possible as maybe just coding changes. Most compilers operate sequentially on input in maybe multiple passes I'm sure you can find ideas easily in a vraiety of sources. And we definitely don't know size of any of these structures ahead of time. well, you don;t need to know if a week ahead of time, but you could maybe waste an access or two finding sizes if that can be done more quickly than just reading everything. _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
Date: Fri, 23 Apr 2010 14:53:57 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up I'd love to discuss specific ideas on prediction - are you familiar enough with the PDF spec to provide any suggestions? No, I started to play with itext for some specific things and then lost time/interest/need but I have a general interest in the topic and may jump back in at some point. Some obvious ones are the xref table - but iText reads that entirely into memory one time and holds onto it, so it seems unlikely that pre-fetch would do much there (other than having the last 1MB of the file be the first block pre-fetched - but any sort of paging implementation would handle that already). The rest... well, from my experience with this, you've got objects that refer to other objects that refer to other objects. And there's really no way to know where in the object graph you need to go until you parse and then go there. So I think I'll need some concrete examples of how this might be done with PDF structure - just to get my creativity going! Well, in a case like that you may want to try to reorder and glean all the stuff you need from what you have in memory before following all the references. Along the lines of find both delims in your array and use String(byte[],offset,length) instead of append(char) a zillion times, scan your current local objects for references they need and que those up before chasing after each one. It may turn out that sorting these and getting them in some order creates a net time savings- I wouldn't have believed this myself until I actually sorted a huge dataset prior to running a program and it turned it from impractical to practical runtime due to increased access coherence. Disk is slower than the low level cache :) _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
Cool, analysis is always a plus and easier to discuss than adjectives. Just a few rather trivial comments. Date: Thu, 22 Apr 2010 02:02:31 +0200 From: To: itext-questions@lists.sourceforge.net Subject: [iText-questions] performance follow up Hello, Good news ... after applying the attached patch to trunk and doing yet another performance experiment using the previously posted workload these are the results: [...] Is iText with the patch better than before? This of course is where you consult Mark Twain. LOL. iText is or isn't better than before ( for some particular use case) irrespective of the data you currently have but the question is does the data allow you to reject the conclusion that they are have the same execution times with some confidence level? Finding ways to explain or attribute the noise into somekind of model of course would be a reasonable thing to consider if you had a few more test cases with some relevant parameters( number of fonts you will need or something). The statistics are just a guide to help you infer something causal- in this case perhaps something like, did the patch cause itext to get better? as you suggested originally. If you can start describing where and how much it got better, response surfaces I guess, then of course you are starting to develop strategy logic, and could take a given task and feed it to the patched or non patched version ( among a new family of altnerative implementations) depending on the parameters you know about it- obviously for the cases you have only one decision makes sense andd off hand based on what you said about nature of patch I don't know of any case where generating gratuitous garbage is a good strategy LOL. The paired observation of the means are: At this stage it is usually helpful to look at the data, not just start dumping it into equations you found in a book. I'm not slamming you at all, just that its helpful to have a check on your analysis even if you are using something canned like Ror a commercial package, more so if you just wrote the analysis stuff today. Don't ignore things like histograms etc, after all my criticisms of PDF for its ability to obscure information with art, sometimes there are pictures worth a thousand words. And of course using the pictures to suggest various sanity checks you can write. The Letter PDF looks good i.e. the patch didn't seem to break anything but you will have to run the unit tests on it. LOL, often people forget this step. Also it sounds like the alt pacakage is still faster by a clinically significant amount- an amount relevant to someone. There may be more coding optimiztions or algorithmic optimizations. for example, converting a string to a byte array could have some benefits, hard to know off hand since that may incur more java code then native code to manipulate but something to consider in a more general case. With a byte array you may be able to avoid creating lots of temp string, just make an int table of the locations of new tokens or pass around indexes instead of temp token strings. etc etc _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
From: brave...@gmail.com Date: Thu, 22 Apr 2010 23:22:36 +0200 To: brave...@gmail.com CC: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up Hello, On Apr 22, 2010, at 10:59 PM, Giovanni Azua wrote: PRTokeniser.isWhitespace is a simple boolean condition that just happen to be called gazillion times e.g. 35'622'000 times for my test workload ... if instead of doing it like: public static final boolean isWhitespace(int ch) { return (ch == 0 || ch == 9 || ch == 10 || ch == 12 || ch == 13 || ch == 32); } does this have to be int vs char or byte? I think earlier I suggested operating on byte[] instead of making a bunch of temp strings but I don't know the context well enough to know if this makes sense. Certainly demorgan can help but casts and calls are not free either. Also, maybe hotspot runtime has gotten better but I have found in the past that look up tables can quickly become competititve with bit operators ( if your param is byte instead of int, a 256 entry table can tell you if the byte is a member of which classes). we used a bitwise binary operator with the appropriate mask(s), there could be some good performance gain ... The function already exists in http://java.sun.com/javase/6/docs/api/java/lang/Character.html#isWhitespace%28char%29 I checked and it already uses bitwise binary operators with the right masks ... we would only need to inline it to avoid the function call costs. Best regards, Giovanni _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
From: Date: Thu, 22 Apr 2010 22:59:43 +0200 To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up Hello Mike, On Apr 22, 2010, at 12:22 PM, Mike Marchywka wrote: This of course is where you consult Mark Twain. LOL. iText is or isn't better than before ( for some particular use case) irrespective of the data you currently have but the question is does the data allow you to reject the conclusion that they are have the same execution times with some confidence level? Good, this is exactly what I meant :) Finding ways to explain or attribute the noise into some kind of model of course would be a reasonable thing to consider if you had a few more test cases with some relevant parameters( number of fonts you will need or something). The performance comparison is based on the representative test case exactly as business wants it. Yes, I know, I'm simply pointing out this is part of a bigger issue that may or may not be relevant to itext but in general is something to consider for long tasks. For example, maybe see FFTW. As far as I know we need only two fonts: light and bold. So the number of fonts is not a parameter. I made that up as a simple strawman. If you make a model for execution time given some parameters that are important, you can pick a specific implementation that you expect to be faster. Again a bit of an extrapolation to make your stats analysis more worthwhile. This book is the official reference for the course in Advance System Performance Analysis I am taking for my graduate CS Master program in the top-10 Technology University of the world ... so no, it is not just equations I found in a book :) LOL, you need to see Allen Greedspan interview, may have been on CNBC or 60 minutes talking about financial models essentially saying he didn't understand them but bright PhD's were doing it so it must be right. The punchline is appeal to authority or credentials when a factual argument is more ala point. This in fact is how Mark Twain gets to the front so quickly. Again, I'm not suggesting you did anything wrong or bad, I haven't actually checked numbers or given the specific test a lot of thought- 9 data points is usually not all that conclusive in any case and I guess that's my point. Presumably you could keep measuring each case with and without patch and slowly ( sqrt N) get better estimate of average execution times. Then, end up with 9 data points that are difference in execution time for each case with/without patch and asymptoticallty measure arbitrarily small differences. However, it may be more helpful to look at pictures like histograms or at least run various assumption checks. You may have non-normal distros and those shapes can tell you something about causes, not always but it helps to look before taking one result and running with it. Now only 23.8% to go. We only need to make 4 more fixes like the last one and the gap will be gone :) The Profiler shows there are still I wouldn't count on the other one being the final solution to pdf creation. Do you have symbols info or can you profile it at all? _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance question
Date: Wed, 21 Apr 2010 11:56:00 +0200 From: giovanni.a...@credit-suisse.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance question Hello Mike, I appreciate your interest and will to help. Thank you. Mike Marchywka [mailto:marchy...@hotmail.com] wrote: Forgive the top posting but since this involves statistical modelling with OP from a financial services company and the comments have already started flying, allow me to throw my two sticks onto the fire. I must say that I don't have a strong quantitative background quite yet ... but I am working on that :o) I think I exercized great restraint not going off about PDF being ideal for a CDO prospsectus or some other innane comments about modelling in financial companies :) It isn't you, its just the material is inherently funny at this point... The fact that I post from this email address only means that I have no access to my private emails from the office so I have no choice. Someone from Lehman posted on another list but now I digress... It may help us here formulate a constructive response if you could dig a bit deeper into the close method and see who the big resource hog is. Also, if you can point to the speed limiting step in your alt package it may be interesting to contemplate. We really don't know too much about details of your typical use case, ... snip ... sometimes IO dominates instead of computation just because no ... snip ... Indeed there is quite some IO involved but this is also why I am benchmarking maybe one alternative does more IO than the other for the same use-case or maybe my implementation of the use-case is not optimal which would also be a valid outcome of the experiment :o) My assumption is that running the same workload under the same conditions (my development environment) should show if there is a significative performance difference between the two alternatives i.e. compare two means. I include the full code for the use-case and workload below. Also, since you went to all the trouble of doing a stats analysis, and since these things are supposed to be deterministic, it may help to get some idea how the execution time noise appears if indeed it is a significant fraction of the average. Presumably this is things like OS, other tasks on machine if you measured wall clock (not cpu time devoted to you) and GC and other stuff including maybe disk and memory cache states. I would point out that depending on exactly what you are measuring, you could be seeing lots of caching hot/cold issues that could dominate the results. I am aware of this, however, I would not seek a lot of isolation because would be like creating a synthetic unrealistic environment for running the benchmarks e.g. if iText did in fact used more memory than the alternative I would not like to hide from the benchmark the consequencing higher GC activity. Again, it depends on what you are measuring. If you just want to tell management approach X will require time T with distribution blah ( normal+SD, or whatever you actually find with params to describe it ) on server foo with a gazillion bytes or RAM etc then that's fine. If you want someone here to figure out why itext is slower, the pointing to a hogging method would help. A few notes about the micro-benchmarking I did: - I do warm up by running the use-case 1K times Even here this is ambiguous, if I did this in a simple case I'd do it from a bash script and the JVM startup time could be significant and variable. Even running a java program once and putting a 1k loop inside may or may not warm up caches but it is probably not realistic for your environment but would be good enough to point to bottle necks if you use the profiling tools ( see sun.com for jhat or profiling ). - I then benchmark 1K times the elapsed time as shown in the method below performanceBenchmark - I do these two points above multiple times - The exact same thing is done for the alternative that generates (almost exactly) the same PDF This can be an important issue- the final result that people care about is usually just a bunch of pixels. If you can substitute cheaper things that look ok, that could be a big deal. - The dynamic allocation of Map of data and similar is emulating what will happen in the real implementation and this is done in the exact same way for the alternative. - I use the output 1K elapsed times for both alternatives to do the paired t-test following the recipee [1, 13.4 Comparing two alternatives] which outputs that iText is lagging behind [22.53, 24.18] milliseconds with 95% confidence meaning that one framework perform faster than the other and that the difference of the means is significative and not merely noise. If you really
Re: [iText-questions] performance question
Forgive the top posting but since this involves statistical modelling with OP from a financial services company and the comments have already started flying, allow me to throw my two sticks onto the fire. It may help us here formulate a constructive response if you could dig a bit deeper into the close method and see who the big resource hog is. Also, if you can point to the speed limiting step in your alt package it may be interesting to contemplate. We really don't know too much about details of your typical use case, we won't forward your comments to IRS or SEC ( LOL). Often you do find that mundane things get taken for granted- I;ve found sometimes IO dominates instead of computation just because no one looked at the code and data gets copied many times and is moved byte by byte etc. Also, since you went to all the trouble of doing a stats analysis, and since these things are supposed to be deterministic, it may help to get some idea how the execution time noise appears if indeed it is a significant fraction of the average. Presumably this is things like OS, other tasks on machine if you measured wall clock ( not cpu time devoted to you ) and GC and other stuff including maybe disk and memory cache states. I would point out that depending on exactly what you are measuring, you could be seeing lots of caching hot/cold issues that could dominate the results. From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Tue, 20 Apr 2010 06:07:13 -0700 Subject: Re: [iText-questions] performance question And following up on point 3 – you have the source code, feel free to modify it for your personal needs. Leonard From: Paulo Soares [mailto:psoa...@glintt.com] Sent: Tuesday, April 20, 2010 8:48 AM To: Post all your questions about iText here Subject: Re: [iText-questions] performance question Hmm, there are lies, damn lies and statistics. While I don't dispute the 30% let's see the probable causes for this: - iText tries to do things correctly avoiding to cut corners that will come and bite your later. Metadata writing, appearance generation and so on. - iText is a generic PDF library. It reads, writes and modifies PDFs. Any library designed with a narrower purpose can optimize the interested areas to perform better. - iText comes with source and can be extended, modified, altered. This implies that a sensible and probably heavier structure must be in place to allow that. If you have a closed source library with just a single purpose things can be done faster as that's all it's going to do. - com.itextpdf.text.pdf.PdfStamperImpl.close() is where everything is written to file, if you avoid calling this nothing will come out. - There are some speed and memory improvements in the pipeline but I don't know how much % improvement will result or in what areas. Paulo From: Azùa Giovanni (KSXD 32) [giovanni.a...@credit-suisse.com] Sent: Tuesday, April 20, 2010 1:12 PM To: Post all your questions about iText here Subject: [iText-questions] performance question Hello, For a specific Letter generation use-case I prepared a test of statistical significance using a paired t-test for comparing the performance [1] of iText vs a commercial PDF framework. The experiment shows that for our relevant use-case iText underperforms by 30% with 95% confidence. I did some further investigation of the iText code for this specific use-case and found the following call to be among the top most expensive calls: com.itextpdf.text.pdf.PdfStamperImpl.close (line 189) taking up to 195 milliseconds The code that invokes such method is the following: private static void appendFooter(PdfWriter writer) throws Exception { Map String replacements = new HashMap(); replacements.put(ph0001, X X); replacements.put(ph0002, Head of Customer Acquisition); replacements.put(ph0003, XX XXX); replacements.put(ph0004, Head of Customer Satisfaction); PdfReader footerReader = new PdfReader(FOOTER_PATH); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); PdfStamper stamper = new PdfStamper(footerReader, outputStream); AcroFields form = stamper.getAcroFields(); for (Map.Entry entry : replacements.entrySet()) { form.setField(entry.getKey(), entry.getValue()); } stamper.setFormFlattening(true); stamper.close(); int pageOne = 1; int xOffset = 5; int yOffset = -560; PdfReader memoryReader = new PdfReader(outputStream.toByteArray()); PdfImportedPage importedPage = writer.getImportedPage(memoryReader, pageOne); PdfContentByte content = writer.getDirectContent(); content.addTemplate(importedPage, xOffset,
Re: [iText-questions] Null pointer exception with PdfStamper.close()
Date: Sat, 10 Apr 2010 14:35:40 +0200 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Null pointer exception with PdfStamper.close() Mike Marchywka wrote: I would mention the source code is available and compiling with debug info may get you a quicker ansewr sometimes. With something called fill and properties there is probably something informative, like keys in a hashtable, that will tell you what you forgot ( for example, it used a key importantInfo to get a result from a hashtable and assumed it would be non null but you just forgot the code that inserts importantInfo. Yes, that's correct. I've looked at the fillOCProperties() method, and there are plenty of places where a value is retrieved from a PDF dictionary using some key, but in all cases, there's a check if (value != null) before something important happens, and I need a standalone example + PDF that causes the problem to find out what goes wrong. I'm writing mobile phone apps and don't usually have stack of line information. When I get an NPE like this and have situation you describe it usually turns out I have a null table. There may not be too many other options although I guess depending on the parameters you could be passing a null but this could throw illegal arg depending on how used etc. Note sure now if HAshtable.get(null) throws IllegalArg or not. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] iText compression modes
Date: Wed, 31 Mar 2010 03:51:35 -0700 From: To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText compression modes Hello Bruno We are need to be able to compress the PDF files as much as we can to store our clients' PDF files. Apparently you are responding to a message that discusses compression and the need to add code to itext to do that as part of the creation process. This would presumably be the best non-lossy compression as it knows the data and isn't just empirically trying to find statistical patterns- for example, if you tried to use winzip on a pdf file. Contrary to what many managers may think, the PDF creation process doesn't add information ( gee, we just multiplied the file size by 10, look at all that new information and knowledge we generated) and it may in fact be that the best compression is simply to save all the input data although others have pointed out that this input data may be quite covert and difficult to find if you want pixel level reproducibility. This input data would in fact be something like a decomposition of your PDF if you had a way to do that as is done with audio in something like ACELP but this would be lossless- ACELP tries to fit an arbitrary waveform to a limited input model and doesn't always work, you already have the input data. Image compression of course is often lossy, and you may not have a losssless restriction. If you really do want minimum file size and don't care about speed or other attributes and can tolerate lossy compression, there are a lot of options. You could even try some lossless data compression things like bzip2 I guess but again it is better and faster if you( the compression algorithms) know what is in the data rather than having to guess or discover it. It is VITAL for us to do that and we would like to keep us using iText. I've compressed some jPEG (10 files aprox. 3.3Mb in total) files using ImageMagic [convert *.jpg -alpha off -monochrome -compress Zip -quality 100 -units PixelsPerInch -density 600 image_deflate.pdf] and the generated PDF was 700kB while using iText was 3.6Mb. As the storage is a BIG concern for us we are trying to find a solution. I personally dug into your book for compression tips but even with PdfStamper setFullCompression the file was still 3.3Mb. ANY help or suggestion will be much appreciated. Bruno Lowagie (iText) wrote: Bruno Lowagie wrote: How much does that matter for your customer? I've just checked. Introducing the concept of compression level would involve changing about 20 classes. It would be possible to set the compression level: - on the writer level (mostly page content streams) - some Image streams - font streams - embedded file streams I'll see if I have the time to do this. While I'm at it, I could also look at the encryption of embedded file stream. Is there anybody I can invoice for this work? Tobias' customer? Tony's customer? br, Bruno _ Hotmail: Trusted email with Microsoft’s powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850552/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] iText compression modes
Date: Wed, 31 Mar 2010 13:28:31 +0200 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] iText compression modes scriptoid wrote: Hello Bruno We are need to be able to compress the PDF files as much as we can to store our clients' PDF files. It is VITAL for us to do that and we would like to keep us using iText. I've compressed some jPEG (10 files aprox. 3.3Mb in total) files using ImageMagic [convert *.jpg -alpha off -monochrome -compress Zip -quality 100 -units PixelsPerInch -density 600 image_deflate.pdf] and the generated PDF was 700kB while using iText was 3.6Mb. You're mixing different concepts. When you set the compression for an image in iText, you are talking about LOSSLESS compression. When you set I guess I'd also mention that lossless to you means preservation of pixels and maybe some document structure ( LOL, although people seem to not want to put this in anyway) , not arbitrary stuff like the order of dictionary entries or something (I'm making stuff up since I don't know PDF innards well enough but others have pointed out that something can be permuted or moved without effect on pixels coming out ). Winzip of course wouldn't know that, but the PDF compression could maybe benefit from uninformative ordering and allocate no bits for it in the compressed format. Again, however, all of this is generated from a set of input data that is probably the most concise representation of your PDF file you will get. Your images are unlikely to compress better once they are mangled into a PDF file but an image compression algorigthm for your source images and text compression for your text and font compression for your ( non-redundant) fonts would be a better way to go. Of course, decompressing all of this ( regenerating the PDF again ) could take a lot of time. the compression for an image in ImageMagic, you're talking about LOSSY compression. In iText, the number of pixes (the resolution) isn't changed. I'm sure that ImageMagic reduces the resolution. I personally dug into your book for compression tips but even with PdfStamper setFullCompression the file was still 3.3Mb. Read section 10.2.6 of the second edition: Lossless compression won't result in dramatic file size reduction. However, if lossy compression is acceptable, you could use the java.awt.Image to reduce the quality. Listing 10.12 shows an example named CompressAwt that explains how to reduce the resoltution. If you have hints to phrase this in a better way so that people don't have the same question you have, feel free to post your suggestions here. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] pdf graphics file questions
Date: Fri, 26 Mar 2010 10:28:04 -0700 To: itext-questions@lists.sourceforge.net From: bri...@ananya.com Subject: Re: [iText-questions] pdf graphics file questions Hi, (I sent this post on 3/22, but it seems it never got out.) Thanks! I hope I will find the information about AICB. Well, I am still a total beginner. So what are the content parser classes? I know how to write a graphics PDF file, but I would like to have detailed instructions how to read a graphics PDF file and translate it into Java code. You guys need a list of frequently used links containing a link to one or more open source renderers. I downloaded one of these and found it quite helpful for dumping things. Also, I'm not sure what exactly you are trying to do but for testing something like this you could probably find better test vehicles. After integration you may not notice much difference in total performance on many PDF files as much of it seems to be parsing and Stuff other than filling in pixels. but I guess if you want to verify equivalence to some other thing it is good to be able to compare pixel for pixel results. Thanks for everything! At 11:14 AM 03/22/10, you wrote: You can certainly look into it - it appears that other products support it, so it may be published... Yes, you can use the new content parser classes to find all the vector drawing commands in the PDF - but they are just the series of commands that you will need to turn into something for your needs. -Original Message- From: Brigit Ananya [mailto:bri...@ananya.com] Sent: Monday, March 22, 2010 1:13 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] pdf graphics file questions Hi, Thanks a lot, Mr. Rosenthol! So, do you mean that the AICB is not private to Adobe, that I could learn it? I will look at the Adobe Illustrator SDK. So, besides trying to learn AICB, my only remaining question is: With iText, Is it possible to read the array of CubicCurve2D.Doubles and the stroke and fill informations from a pdf graphics file of curves? Well, this is probably a question for someone else. Thanks in advance for responding. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_3 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Specifying HTTPClient to access URL in iText
Date: Wed, 24 Mar 2010 15:01:57 -0700 From: rthanga...@ebay.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Specifying HTTPClient to access URL in iText Hi, I am using iText2.0.7 jar to create a PDF document. I am passing the imageurl to the Image.getInstance() method to get the Image. But i am getting the below exception java.net.ConnectException: Connection timed out: connect I understand this is due to the firewall setting on the image server side. Does iText provides a way to specify the Image URL as well as our own HTTPClient. So that i can specify proxy to establish the connection. You want to pass it a connection of somekind? I think the alt I've seen is to use the byte[] signature, it is very difficult to make an API that takes a URL and gives you complete flexibility when all you want is some unrelated widget. I guess something that takes a connection would not be unreasonable but at that point you may as well extract the data yourself and pass that into the method which has been discussed here before. Thanks in advance. -- View this message in context: http://old.nabble.com/Specifying-HTTPClient-to-access-URL-in-iText-tp28019932p28019932.html Sent from the iText - General mailing list archive at Nabble.com. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_3 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Querry regarding iText jars: What is the Maxsize a PDF can be generated ?
From: psoa...@glintt.com To: itext-questions@lists.sourceforge.net Date: Wed, 24 Mar 2010 16:43:27 + Subject: Re: [iText-questions] Querry regarding iText jars: What is the Maxsize a PDF can be generated ? Not really deliberated. iText started 10 years ago when a gigabyte was a lot bigger than it is now. 2G looked enough then and now nobody has the time to change the code to support it. I also doubt that there's a need for it safe for a couple of people (Gylfi included). IS this the large file issue that normally strikes at about 4GB when you have unsigned 32 bits ints? I thought most itext API's used streams and therefore had no real limits related to files. Any user could presumably subclass a file or stream as long as nothing inside itext uses random access with 32 bits indicies. Paulo - Original Message - From: Gylfi Ingvason To: 'Post all your questions about iText here' Sent: Wednesday, March 24, 2010 4:27 PM Subject: Re: [iText-questions] Querry regarding iText jars: What is the Maxsize a PDF can be generated ? Don't know about the Java version of iText, but last time I checked, iTextSharp did not support generating PDF files greater than 2 GB and my impression from Paulo was that this was deliberate and that adding that support was not being planned. From: Leonard Rosenthol [mailto:lrose...@adobe.com] Sent: Wednesday, March 24, 2010 12:03 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Querry regarding iText jars: What is the Max size a PDF can be generated ? PDF supports files that are hundreds/thousands of pedabytes(!) in size. iText, however, may be limited to the original 10 gigabyte limitation. From: G Chalpati Rao [mailto:gchalpati...@yahoo.com] Sent: Wednesday, March 24, 2010 9:28 AM To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Querry regarding iText jars: What is the Max size a PDF can be generated ? Hi , I have a querry regarding the usage of iText jars. What is the maximum size in a file, that it can be converted to the PDF format. Please respond as soon as possible. What are the API's we will use to convert a big file to PDF format ? I Need a file size of more than 400 mb. Thanks Regards G.C.Rao Looking for the perfect gift? Give the gift of Flickr! _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850553/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Perfomance Question - ByteArray vs Files
From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Sat, 20 Mar 2010 19:50:02 -0700 Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files I would, of course, argue with #3 as would the governments of every country on earth, every major enterprise around the world, etc. But you let #2 go by without comment? LOL. This is becoming a debate on religion. IF you are going to defend PDF in a technical forum by appeal to popularity among large organizations without a primary focus on technology, well, I won't feel too bad about anything I post :) You have discussed the voting machine, now what about the weighting machine? LOL. -Original Message- From: warren [mailto:warrenonsourcefo...@charter.net] Sent: Saturday, March 20, 2010 2:34 AM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files Ok. So the answers are 1) To understand how PDFReader works, I need to dig into the source files and attempt to learn the underpinnings of the iText code and JAVA. I hadn't planned on this since I am implementing iText from another language like a black box. I've only played with JAVA directly on a limited basis. Guess I'll have to dive in. 2) Disk I/O is Bad 3) PDFs are Bad. Not sure I have much choice since PDF is what the customer wants. 4) Empirical testing is the answer. Code both methods and test various conditions. If results are bad, dig deeper. I have limited access to the server but I'll see what tools are available. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Trusted email with Microsoft’s powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850552/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] How to test the pdf output?
Date: Sun, 21 Mar 2010 19:21:21 +0800 From: To: itext-questions@lists.sourceforge.net Subject: [iText-questions] How to test the pdf output? Hi, What's the *best* way of regression testing my Java code that generates pdf documents? Comparing the pdf file byte to byte to reference documents? How do you do regression testing for iText? Id been advocating use of an open source renderer and doing pixel compares but I'm not actually active in the field. Presumably you can also instrument the renderer to dump various lists or hashes and compare these. I guess you need to think of this somewhat like regression on floating point algorithms that close is good enough or you will get lots of spurious errors. If you really want to get fancy, and I'm just speculating here, you may be able to find image analysis or compression programs which can try to compression your difference image between ref and test pages and diagnose or attribute the differences to some types of features- give you some indication if they are perceptually important or not without having to open the image itself. Lossy compression usually tries to only capture the stuff a viewer is likely to care about. Thanks Fred -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850553/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Perfomance Question - ByteArray vs Files
From To: itext-questions@lists.sourceforge.net Date: Sun, 21 Mar 2010 14:26:38 -0500 Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files This is not about general coding philosophy or the merits of PDFs. I am in an iText forum specifically because I need to produce a PDF. I always evaluate what the customers needs and wants are to produce the proper output. If I didn't want a PDF, I wouldn't be here. End of story. That doesn't mean you need to produce a PDF de novo everytime someone from google finds your site, we are talking about optimizations given constraints. PDF is expensive compared to alt ways to preview for example. In my opinion, one has to balance coding with resources. I've been on servers where developers don't do this, resulting in server crashes and poor performance. I don't like it so I always try to understand what is going on under the covers to figure out where basic intelligent tradeoffs can be made. Yes, there are different techniques that one can use to achieve this balance and we can have endless discussions about that. But not now. I am nowhere near ready to get into advanced techniques in Java or looking for bottlenecks buried deep in the server, especially since I'm on a closed server with very limited access. Not being a Java programmer and being new to iText, I'm at a big disadvantage. I was hoping I could treat iText more like a black box and, with a basic understanding, use it efficiently. I was trying to ask a specific technical question targeted at memory utilization of PDFReader and PDFStamper. Seemed pretty obvious to me that since I was making two passes there was a tradeoff here. Somewhere we went off the tracks. I am very disappointed there isn't technical expertise available on this on forum that can give an overview of the process and answer the question. If you don't have the expertise yourself to evaluate the strategies we have outlined, how do you even state with confidence [...]I've been on servers where developers don't do this, resulting in server crashes and poor performance. maybe they are doing the best they can with the constraints available and the real problem is you just need to buy a bigger server? Nobody here knows anything about the statistics or parameters of your system or data. If you want some general ideas to discuss with your own experts, I hope we could help. Otherwise, you will have to hope someone who has looked at the relevant code can just give you a simple answer. I don't even remember your specific question but what exactly would you do with the answer? The other thing is that people like you often ask questions that don't address their likely real concerns. Talking around a little, time permitting, may help solve an underlying problem or answer a question someone else has while browsing this in the archives. Sometimes people do code kluges for the sake of pushing things out the door. Maybe you have spotted something that is in fact a coding placeholder but again it would be easier for someone to just look at the source code or a heap dump than asking such a question. If I still had the code somewhere I'd be tempted to look but still not sure what you would do with an answer. Do you just want someone to read the code to you? Perhaps you could try asking the original question again. Not that it would help, but I would point out that multiple passes through a dataset are generally bad due to loss of memory locality. This invites thrashing- usually just lower level memory caches but it can make your disk light stick on due to VM thrashing. The strategy is to try to do block oriented operations in sizes so that you only use things in lower level caches. _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850553/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Perfomance Question - ByteArray vs Files
From: warrenonsourcefo...@charter.net To: itext-questions@lists.sourceforge.net Date: Fri, 19 Mar 2010 20:34:02 -0500 Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files Ok. So the answers are 1) To understand how PDFReader works, I need to dig into the source files and attempt to learn the underpinnings of the iText code and JAVA. I hadn't planned on this since I am implementing iText from another language like a black box. I've only played with JAVA directly on a limited basis. Guess I'll have to dive in. Well that was my answer but no one who knows the code has volunteered more details. IF you post a heap dump and ask why certain things are being made, someone may or may not know. 2) Disk I/O is Bad doing anything is bad, as you point out allocating and holding is bad. VM just seems to be an unappreciated bottlebeck ( but its all in memory and you can always just go buy more memory, that is cheap today LOL). 3) PDFs are Bad. Not sure I have much choice since PDF is what the customer wants. Personally they tend to be used where other formats can do and they are used in such a way that the files increase in size and decrease in information content. Certainly if you have text and can just dump that to the browser, that is faster ans uses less memory than adding artwork. It also may just be a matter of obviousness too- if you were generating html, you may be able to easily identify constant blocks of html you could cache instead of regenerating each time and in PDF if you don't know the details it may be less obvious. 4) Empirical testing is the answer. Code both methods and test various conditions. If results are bad, dig deeper. I have limited access to the server but I'll see what tools are available. Well it is easy to get confused and the more direct tests are more direct. If you just measure the time intervals routinely, System.currentTimeMillis IIRC in java you can get some idea where the problems may be., Memory allocations you should be able to examine on your desktop easily. I think my suggestion was that for expensive efforts spending more time for strategy selection or doing things like sorting the data may produce a net benefit, again it all depends. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Perfomance Question - ByteArray vs Files
From: To: itext-questions@lists.sourceforge.net Date: Fri, 19 Mar 2010 16:16:14 -0500 Subject: [iText-questions] Perfomance Question - ByteArray vs Files I'm creating a PDF in two passes with my goal to end up with it as a file on the server. The first pass creates the PDF and the second adds things like headers, footers, etc. using PDFStamper. The PDF is being generated from a database so there is a possibility that it could get to be large (a few hundred pages?). This really has nothing to do with itext but some people have discussed performance issues and indeed the inner itext implementations may want to vary depending on what the user can say apriori about some sizes etc. ( for large tasks, spending some time up front picking a strategy or specific implementation can pay off). And, of course, I'm a perennial complainer about the resources related to the PDF file versus alternatives. First, it may really help if you profile whatever you have- if there is anything slower than something called PDF, a highly loaded DB could be it. Do you keep requesting the same (static) data from it? etc etc. Of course, trying to do everything in memory sounds faster until you find out that your memory is virtual and you keep thrashing. If you want to rely on the OS great but if you think you can do better you may benefit from reading/writing to disk the stuff you want instead of making a huge heap and letting the VM system deal with it. Once you are all in physical memory, then you want to try to keep locality and stay in a lower level memory cache ( hard with java). On some large data sets in other settings, I have used a sort ( yes, another slow thing) to stop memory thrashing and speed improvement was order of magnitude (from essentially unusable to quite tolerable). So, I guess the most authoritative answer is, it depends. Right now I have the PDFWriter directing the output to a FileOutputStream. Once that is done, the PDFReader picks it up, connects to and uses in PDFStamper to process and send the PDF to the server using another FileOutputStream. It occurred to me that I might be doing this wrong. If PDFReader brings the whole PDF into memory, wouldn't it be better to have PDFWriter put the PDF out as a ByteArrayOutputStream which (I think) PDFReader can pick up? Or does PDFReader only bring it in as it needs it? Or is there some other issue I'm missing I'm not real clear on what the tradeoffs are between running everything out files and accepting the I/0 or keeping everything in memory. Can anyone give me some guidance? Thanks! Warren _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Perfomance Question - ByteArray vs Files
From: warrenonsourcefo...@charter.net To: itext-questions@lists.sourceforge.net Date: Fri, 19 Mar 2010 17:34:37 -0500 Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files I'm asking because I've been on servers where applications did everything in memory, causing both the dreaded JVM out of memory error and the server to bog down. I'm not as concerned about marginal performance benefits as I am about being a good server citizen and keeping my resource consumption down. I'm not a java guru and am unfamiliar with what goes on under the covers. The first law of computer science is once the disk light comes on its time for a coffee break. If you want to learn how to keep the disk light off, go to sun.com and look at things like profiling tools and jhat iirc. Again, OT for itext but a common concern. I quite agree that one needs to look at a the system, not just a piece of it. Yes, I am optimizing out our ORACLE database for speed, using stored procedures, indexing, etc. So Is my assumption wrong about the PDFReader taking the whole thing into memory? If it is taking the whole thing in, then I may as well create the PDF in memory in the first place and hook both passes together (assuming this is possible). AFAIK the source code is still open, that is usually the best way to get a helpful understanding and often the original authors have forgotten details. If you dump your heap some things may jump out at you. This could also be a reminder that dumping things to disk and the clever use of compression or concise rerepesenations could help even if conversions are frequent or somewhat expensive. If PDFReader doesn't do that, then I'm leaning more towards the File side of things so that if I get a large output from the DB I won't bog down the server. I expect that most PDFs will be a few pages but my users have been known to make strange requests. As always, I would suggest reviewing the need to create the PDF in the first place if html or raw text would do. And, you can look at some parameters, like the number of things you get back from DB, and pick an implementation. But again you need some empirical measures of wall clock time- simply printing the java millisecond time diffs may give you a good idea what is the bottleneck. If you have multithreaded code there could be all kinds of rsource contentions or cpu spinning etc, again it could be anything that limits your performance. If the user requests are highly redundant, you may benefit from caching fonts or intermediate results like headers et etc - Original Message - From: Mike Marchywka To: Sent: Friday, March 19, 2010 4:47 PM Subject: Re: [iText-questions] Perfomance Question - ByteArray vs Files From: To: itext-questions@lists.sourceforge.net Date: Fri, 19 Mar 2010 16:16:14 -0500 Subject: [iText-questions] Perfomance Question - ByteArray vs Files I'm creating a PDF in two passes with my goal to end up with it as a file on the server. The first pass creates the PDF and the second adds things like headers, footers, etc. using PDFStamper. The PDF is being generated from a database so there is a possibility that it could get to be large (a few hundred pages?). This really has nothing to do with itext but some people have discussed performance issues and indeed the inner itext implementations may want to vary depending on what the user can say apriori about some sizes etc. ( for large tasks, spending some time up front picking a strategy or specific implementation can pay off). And, of course, I'm a perennial complainer about the resources related to the PDF file versus alternatives. First, it may really help if you profile whatever you have- if there is anything slower than something called PDF, a highly loaded DB could be it. Do you keep requesting the same (static) data from it? etc etc. Of course, trying to do everything in memory sounds faster until you find out that your memory is virtual and you keep thrashing. If you want to rely on the OS great but if you think you can do better you may benefit from reading/writing to disk the stuff you want instead of making a huge heap and letting the VM system deal with it. Once you are all in physical memory, then you want to try to keep locality and stay in a lower level memory cache ( hard with java). On some large data sets in other settings, I have used a sort ( yes, another slow thing) to stop memory thrashing and speed improvement was order of magnitude (from essentially unusable to quite tolerable). So, I guess the most authoritative answer is, it depends. Right now I have the PDFWriter directing the output to a FileOutputStream. Once that is done, the PDFReader picks it up, connects to and uses in PDFStamper to process and send the PDF to the server using another FileOutputStream
Re: [iText-questions] (no subject)
From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Thu, 18 Mar 2010 13:16:42 -0700 Subject: Re: [iText-questions] (no subject) PDF doesn’t support a “table structure” – you will need to apply advanced heuristics to figure out what is (or isn’t) a table and what is it’s “header”, “columsn”, etc. LOL, I've found that swearing and mashing the keyboard help too. I would suggest your reiterate your comments to me about asking authors to retain logical structure if they want to turn information into a work of art. btw, whoever suggested that webkit based html to pdf converter saved me a lot of work- I was able to drop that into an immediate problem ( I will eventually remove the pdf component, I just needed a way to get a list of all the resources needed by a web page and I had dug into webkit but didn't have an easy to use front end). Thanks. Leonard From: Ahmad Amin [mailto:ahmad_a...@siliconexpert.com] Sent: Thursday, March 18, 2010 5:17 PM To: itext-questions@lists.sourceforge.net Subject: [iText-questions] (no subject) Hi I'm try to extract PDF Text content automatically, The problem is when I encounter Text in different table structure, I Couldn't differentiate between headers and columns values, I'm using Eclipse as JAVA2 IDE and most popular PDF Lib. (JPedal, iText, PDFOne Java, PDFBox) all these Libraries extract Text as fine but doesn't Give me capabilities To Detect PDF Table in table format (headers and columns). So I will appreciate any help from your side thanks _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850553/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] adobe error
Date: Mon, 15 Mar 2010 08:36:16 +0100 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] adobe error Stephen gallaghan wrote: Hi all I am producing PDF documents using java but am gettin some errors with adobe acrobat reader 8 but not 9 does anybody know of some software that will report what the error is? No, but I have a TV that doesn't work anymore, do you know what could be wrong? No, you can't because I'm not saying what is broke. I could have forgotten to plug the TV (and without electricity the TV doesn't work). Or the TV could be working, and the problem could be a broken remote control. So please don't post question saying am gettin some errors, be more specific and tell us which errors ur gettin. I think he is asking for a diagnostic tool, not an answer, and presumably the tool would work with a broad range of problems, does anybody know of some software that will report what the error is? this comes up from time to time, I usually suggest the open source renderer. You will need to instrument it yourself but it is reasonably easy to follow. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850553/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Can IText Acheive this ?
2. Can I edit/save this pdf ? A form created by iText can not be saved locally by a student/professor using Adobe Reader (= why the word online is so important). The data can be entered and submitted to a server. On the server, the form can be filled with that data and that filled out form can be saved. If you need a solution where the form can be saved locally, you have to Reader Enable the form, and that's only possible with Adobe software. Given that this is a targeted audience, can't you also distribute a custom reader from the open source project that is always enabled? What is so special about the process of saving locally that requires a special option during document creation? Also, since many people assume that PDF is some standard with support from multiple companies , why does the answer you can only do that with adobe products come up so often? Is there an alternative even if it involves some digging or writing? Adobe may have a lot of things but none of this should be magic or inherently impossible for others to support. This was always one thing that bothers me about PDF. Ok, fine you can call it a standard and say it is portable and not vendor specific but the realistic options for getting tools seem to be limited in some cases. Thanks. _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Integration a 3D CAD drawing into a PDF
Date: Wed, 3 Mar 2010 23:17:47 -0800 From: j...@infolox.de To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Integration a 3D CAD drawing into a PDF Hello, I'm a rookie in iText and Java and need some help. I try to integrate a 3D CAD drawing into a PDF. The code you can see below. Unfortunatly shows the artwork from the bottom view. I don't undertstand the impact of the parameters to the view. Is there any description with examples how to use the iText solving that problem? Could anybody help me with that problem? I was hoping Leonard would have replied to this- didn't you post this same thing the other day? In fact, I think when claimed that PDF is just a bunch of pixels, he pointed me to an example in which a PDF contained a 3D model of something. I thought he may explain how you could put the model, not a 2D view, into the PDF... Thank you in advance! Julia package com.lowagie.toolbox.plugins; import java.io.*; import javax.swing.*; import com.lowagie.text.*; import com.lowagie.text.pdf.*; import com.lowagie.toolbox.AbstractTool; import com.lowagie.toolbox.arguments.*; import com.lowagie.toolbox.arguments.filters.PdfFilter; import com.lowagie.toolbox.arguments.filters.U3DFilter; import java.net.*; /** * This tool lets you add a embedded u3d 3d annotation to the first page of a document. Look for * sample files at http://u3d.svn.sourceforge.net/viewvc/u3d/trunk/Source/Samples/Data/ * @since 2.1.1 (imported from itexttoolbox project) */ public class Add3D extends AbstractTool { static { addVersion($Id: Add3D.java 3373 2008-05-12 16:21:24Z xlv $); } FileArgument destfile = null; public static final String PDF_NAME_3D = 3D; public static final String PDF_NAME_3DD = 3DD; public static final String PDF_NAME_3DV = 3DV; public static final String PDF_NAME_3DVIEW = 3DView; public static final String PDF_NAME_C2W = C2W; public static final String PDF_NAME_IN = IN; public static final String PDF_NAME_MS = MS; public static final String PDF_NAME_U3D = U3D; public static final String PDF_NAME_XN = XN; /** * This tool lets you add a embedded u3d 3d annotation to the first page of a document. */ public Add3D() { super(); menuoptions = MENU_EXECUTE | MENU_EXECUTE_SHOW; FileArgument inputfile = new FileArgument(this, srcfile, The file you want to add the u3d File, false, new PdfFilter()); arguments.add(inputfile); FileArgument u3dinputfile = new FileArgument(this, srcu3dfile, The u3d file you want to add, false, new U3DFilter()); arguments.add(u3dinputfile); StringArgument pagenumber = new StringArgument(this, pagenumber, The pagenumber where to add the u3d annotation); pagenumber.setValue(1); arguments.add(pagenumber); destfile = new FileArgument(this, destfile, The file that contains the u3d annotation after processing, true, new PdfFilter()); arguments.add(destfile); inputfile.addPropertyChangeListener(destfile); } /** * Creates the internal frame. * */ protected void createFrame() { internalFrame = new JInternalFrame(Add3D, true, true, true); internalFrame.setSize(300, 80); internalFrame.setJMenuBar(getMenubar()); System.out.println(=== Add3D OPENED ===); } /** * Executes the tool (in most cases this generates a PDF file). * */ public void execute() { try { if (getValue(srcfile) == null) { throw new InstantiationException( You need to choose a sourcefile); } if (getValue(srcu3dfile) == null) { throw new InstantiationException( You need to choose a u3d file); } if (getValue(destfile) == null) { throw new InstantiationException( You need to choose a destination file); } int pagenumber = Integer.parseInt( (String) getValue(pagenumber)); // Create 3D annotation // Required definitions PdfIndirectReference streamRef; PdfIndirectObject objRef; PdfReader reader = new PdfReader(((File) getValue(srcfile)) .getAbsolutePath()); String u3dFileName = ((File) getValue(srcu3dfile)) .getAbsolutePath(); PdfStamper stamp = new PdfStamper(reader, new FileOutputStream( (File) getValue(destfile))); /*Add Infos to HashMap HashMap info = reader.getInfo(); info.put(Author, infolox); stamp.setMoreInfo(info); stamp.insertPage(reader.getNumberOfPages(), reader.getPageSize(pagenumber));*/ PdfWriter wr = stamp.getWriter(); PdfContentByte cb = stamp.getUnderContent(pagenumber); Rectangle rectori = reader.getCropBox(pagenumber); /*Rectangle rect = new Rectangle(new Rectangle(100, rectori.getHeight() - 550, rectori.getWidth() - 100, rectori.getHeight() - 150)); */ Rectangle rect = new Rectangle(new Rectangle(55, rectori.getHeight() - 675, rectori.getWidth() - 55, rectori.getHeight() - 175)); PdfStream oni = new PdfStream(PdfEncodings.convertToBytes( runtime.setCurrentTool(\Rotate\);, null)); oni.flateCompress(); // Create stream to carry attachment PdfStream stream = new PdfStream(new FileInputStream(u3dFileName), wr);
Re: [iText-questions] Performance improvement to PdfGraphics2D
Date: Thu, 4 Mar 2010 10:16:30 -0800 From: To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Performance improvement to PdfGraphics2D Hello, I was using iText to convert a JTable to a PDF. This was consuming a large amount of memory and taking a long time, so I did some memory profiling and have attached a patch that significantly improves performance for us. The following describes what I found, and what the patch does: When printing a JTable, you have to construct a lot of child PdfGraphics2D objects. For each child, the following happens: 1. A BufferedImage is created just so that we can get a regular Graphics2D. This Graphics2D object may never be used, so I patched PdfGraphics2D to construct it only if needed. But ctor calls are order-0 ( humour). Yes, garbage generation can be a big deal and beside the ultimate GC problems that may not show up on profiling ( when the GC thread executes it doesn't show up in your stack trace), initialization code can take forever because, well, everything is initialized including large arrays ( you don't get initialized memory for free even if it is still one line of source code) etc. Usually you see warnings about this with string manipulations since temps aren't always appreciated and can become significant in a hurry. I also remember being shocked at the start up time in some apps that were cleaned up to be more OO- I'm really not sure if anyone cares about init resources... In C++ there is some hope compiler can fix a lot of OO overhead but things are worse in java. Once terms like graphics start to appear, the attention goes to inner loops and cool terms related to getting pixels onto the screen. With java, the optimization in this native code can make all the surrounding stuff an important time sink. 2. Two arrays of PdfGState are created, but are then replaced with the parent's arrays. I patched PdfGraphics2D to create these arrays in the non-private constructor. You might want to consider using the clone() method instead of keeping that private constructor around. The normal .clone() behaviour is very similar to what you have done manually in the .create() method. Finally, I noticed that the AWT PathIterator.currentSegment(float[]) method creates a double[] internally. That is because the float[]-based method just passes through to the double[]-based method. I modified your use of the PathIterator to take this into account. Can this patch be included in the next release? Also, I am working on a commercial product. Can you clarify for me whether or not iText PDF can be included as a .jar in our commercial (non-open-source) product? I cannot remember whether using a .jar is considered a derivative work or not. If we're allowed to use it, then I will probably do a little more work on improving performance of iText and will send that on. Thanks, Peter. _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469227/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Converting files to PDF and TIFF
Date: Wed, 3 Mar 2010 19:36:53 +0600 From: kasun0...@gmail.com To: iText-questions@lists.sourceforge.net Subject: [iText-questions] Converting files to PDF and TIFF Hi all, I am new to IText.I am developing a java application where a method take a list of files that can be of type .doc, .txt, .rtf, .html, .TIFF, .odt .The list of files is iterated over and each one is converted and added to a single TIFF file. The new TIFF file is then returned. And another method which take a list of files that can be of type .doc, .txt, .rtf, .html, .TIFF, .odt The list of files is iterated over and each one is converted and added to a PDF file. The PDF file is returned Will i be able to do this only using itext or do i need to use any other thirt party library for this purpsose. JAI comes up sometimes depending on what images you really end up wanting. I've been pushing an open source renderer to help diagnose your pdf results. Have you looked at open office source code I would think there may be a bit in there for some conversions but I'm not sure what all it actually does or doesn't do. If you have any suggestions or thoughts please fill me with them. Thanks Best Regards Kasun _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469227/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title
Date: Sun, 28 Feb 2010 07:55:06 -0800 From: sandys...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title Hi Mike, The .jar is always built into the .ear for the application and deployed to the server. The original WL8 version used iText-1.4.7.jar. I did not have to make any significant code changes when I went from iText-1.4.7.jar to iText-2.1.7.jar, while there were quite a few changes going to iText-5.0.1.jar. Since WL10 runs under Java 1.6, and the 5.0.1 version is written in Java 5, I was hoping the Adobe 9 issues were resolved in iText-5.0.1.jar. well, it doesn't sound like a gross problem- the insiginifcant would be good suspects I guess. I wish it was easier to narrow down the problem in the iText.jar. Since no errors are thrown and I have not been able to pinpoint the difference in the documents I was hoping somebody had already experienced the problem. Its hard to know if itext even thinks there is a problem. I guesss a debug jar could help, I have gotten in to habit of using the c++ preprocessor with java and can make build with various features or for different target platforms ( even java is not entirely platform independent). Really this could be anything- an invalid input image, a messed up font, etc. If you can identify what the reader is complaining about you should be able to narrow down the possible code issues. It is possible you could just dump the pdf as an ascii file and visually compare the ascii to a known good one or see if the high bit is now being reset using a binary dump utility etc. Thanks... S Mike Marchywka-2 wrote: Date: Sat, 27 Feb 2010 11:12:04 -0800 From: sandys...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title Hi Mike, The code has been under my control through the changes aso I can confirm that no configurations changes were made. Also all the tests were run on Changing versions of something often creates small config changes. This could be a classpath order or some really subtle thing- unlikely but I'm pointing out issues. Normally you can't just change a jar file when API has changed and assume all the methods are the same and it sounds like you never tried to recompile your app against the new itext jars. I usually try to have a build of somethiing designed to run on a server that runs from the command line so it is easier to test. the same server so that is also not likely to be the problem. However, my PC has Adobe 9 installed which was the client when I ran the tests described below. Then later, I tested it on a PC with Adobe 8 installed and I did not see ant of the errors. This could be an Adobe 9 related problem. Does anybody know of a fix for it? If you want to approach this empirically and look at properties of the final result, version-specific Adobe problems can only be answered by an insider like Leonard- I'm not sure if either version has any means to get a detailed error report or report back to adobe when it finds an error ( obviously I try not to use them LOL) . I'm not sure there is a good pdf-dump tool likely to point to the questionable code inside the pdf but maybe someone can suggest one. But, again, if you can narrow down the problem item and it seems to be isolated, you may be able to find the itext code responsible and post that. Thanks S Mike Marchywka-2 wrote: Date: Fri, 26 Feb 2010 11:53:27 -0800 From: sandys...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] PDF opened in Java shows 'Error Page' in Title This may be an application specific problem. If anybody has experienced it before, please let me know. I am in the process of upgrading a java web application from Weblogic8 to Weblogic10. The application renders invoices in PDF format via iText.jar. There have been some cases where web servers treat PDF as ASCII and clear high bit. You wouldn't want to rule out a change in configuration so if you can diff config files that may be worthwhile ( if not for this specific problem more generally ). Some people have complained about quirks or problems using their code with servlets etc and it isn't hard to write code that only works for a very idiosyncratic server setup. The original WL8 version used iText-1.4.7.jar For the WL10 version I first tried iText-2.1.7.jar saw the errors I am about to describe in this post and so am now trying out iText-5.0.1.jar but still see the same errors. The invoices (1 or 2 page documents) are all rendered correctly, however, some of the invoices are rendered with the title displayed as: Billng App - Error Page (only this line shows) https://... url
Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title
Date: Sat, 27 Feb 2010 11:12:04 -0800 From: sandys...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title Hi Mike, The code has been under my control through the changes aso I can confirm that no configurations changes were made. Also all the tests were run on Changing versions of something often creates small config changes. This could be a classpath order or some really subtle thing- unlikely but I'm pointing out issues. Normally you can't just change a jar file when API has changed and assume all the methods are the same and it sounds like you never tried to recompile your app against the new itext jars. I usually try to have a build of somethiing designed to run on a server that runs from the command line so it is easier to test. the same server so that is also not likely to be the problem. However, my PC has Adobe 9 installed which was the client when I ran the tests described below. Then later, I tested it on a PC with Adobe 8 installed and I did not see ant of the errors. This could be an Adobe 9 related problem. Does anybody know of a fix for it? If you want to approach this empirically and look at properties of the final result, version-specific Adobe problems can only be answered by an insider like Leonard- I'm not sure if either version has any means to get a detailed error report or report back to adobe when it finds an error ( obviously I try not to use them LOL) . I'm not sure there is a good pdf-dump tool likely to point to the questionable code inside the pdf but maybe someone can suggest one. But, again, if you can narrow down the problem item and it seems to be isolated, you may be able to find the itext code responsible and post that. Thanks S Mike Marchywka-2 wrote: Date: Fri, 26 Feb 2010 11:53:27 -0800 From: sandys...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] PDF opened in Java shows 'Error Page' in Title This may be an application specific problem. If anybody has experienced it before, please let me know. I am in the process of upgrading a java web application from Weblogic8 to Weblogic10. The application renders invoices in PDF format via iText.jar. There have been some cases where web servers treat PDF as ASCII and clear high bit. You wouldn't want to rule out a change in configuration so if you can diff config files that may be worthwhile ( if not for this specific problem more generally ). Some people have complained about quirks or problems using their code with servlets etc and it isn't hard to write code that only works for a very idiosyncratic server setup. The original WL8 version used iText-1.4.7.jar For the WL10 version I first tried iText-2.1.7.jar saw the errors I am about to describe in this post and so am now trying out iText-5.0.1.jar but still see the same errors. The invoices (1 or 2 page documents) are all rendered correctly, however, some of the invoices are rendered with the title displayed as: Billng App - Error Page (only this line shows) https://... url to the page The title should display: https://... url to the page Error Page is the default title for the application's errorpage.jsp Can you trace the code far enough to get something related to itext? Did you recompile against new itext or just replace jar files? Often catching Exception is done when Throwable is more comprehensive. In particular note that this does not derive from exception, http://java.sun.com/j2se/1.4.2/docs/api/java/lang/NoSuchMethodError.html If the applicaiton is throwing any errors, it is not apparent as there are no errors in the logs or any visible differences in the PDF document. It is always the same documents that display the error. Please let me know if you need more information, code extracts or screen shots. Your responses to this would be much appreciated. Thanks! S -- View this message in context: http://old.nabble.com/PDF-opened-in-Java-shows-%27Error-Page%27-in-Title-tp27722675p27722675.html Sent from the iText - General mailing list archive at Nabble.com. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list
Re: [iText-questions] PDF opened in Java shows 'Error Page' in Title
Date: Fri, 26 Feb 2010 11:53:27 -0800 From: sandys...@yahoo.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] PDF opened in Java shows 'Error Page' in Title This may be an application specific problem. If anybody has experienced it before, please let me know. I am in the process of upgrading a java web application from Weblogic8 to Weblogic10. The application renders invoices in PDF format via iText.jar. There have been some cases where web servers treat PDF as ASCII and clear high bit. You wouldn't want to rule out a change in configuration so if you can diff config files that may be worthwhile ( if not for this specific problem more generally ). Some people have complained about quirks or problems using their code with servlets etc and it isn't hard to write code that only works for a very idiosyncratic server setup. The original WL8 version used iText-1.4.7.jar For the WL10 version I first tried iText-2.1.7.jar saw the errors I am about to describe in this post and so am now trying out iText-5.0.1.jar but still see the same errors. The invoices (1 or 2 page documents) are all rendered correctly, however, some of the invoices are rendered with the title displayed as: Billng App - Error Page (only this line shows) https://... url to the page The title should display: https://... url to the page Error Page is the default title for the application's errorpage.jsp Can you trace the code far enough to get something related to itext? Did you recompile against new itext or just replace jar files? Often catching Exception is done when Throwable is more comprehensive. In particular note that this does not derive from exception, http://java.sun.com/j2se/1.4.2/docs/api/java/lang/NoSuchMethodError.html If the applicaiton is throwing any errors, it is not apparent as there are no errors in the logs or any visible differences in the PDF document. It is always the same documents that display the error. Please let me know if you need more information, code extracts or screen shots. Your responses to this would be much appreciated. Thanks! S -- View this message in context: http://old.nabble.com/PDF-opened-in-Java-shows-%27Error-Page%27-in-Title-tp27722675p27722675.html Sent from the iText - General mailing list archive at Nabble.com. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/201469228/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Extracting a page as an Image
Date: Wed, 24 Feb 2010 12:04:12 +0100 From: jan.lendh...@vevention.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Extracting a page as an Image Hi there, Just a short question, as I did not find anything on google or I used the wrong search phrases. I would like to extract the whole first page of a pdf file as an image (png or jpeg or so). again, I think the pdf tool kit of xpdf stuff from foolabs works. Also, this was what I did with the open source renderer and I earlier had an interest in getting OCR samples. Just trying to skim my history file, there is something called pdfimages and I can also find the command lines where I used a modified open source renderer to do the same thing, as I recall that wasn't too difficult but I think I archived all that code. It does seem I tried to use imagemagick, can't rememberif that worked. http://www.google.com/#hl=ensafe=offq=pdfimagesaq=faqi=g10aql=oq= http://www.foolabs.com/xpdf/ and the final solution, https://pdf-renderer.dev.java.net/ Is this possible or are there any examples out there? Thanks a lot in advance, Jan _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Extracting a page as an Image
Date: Wed, 24 Feb 2010 05:08:44 -0800 From: wasegra...@bellsouth.net To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Extracting a page as an Image - Original Message From: wasegraves To: Post all your questions about iText here Sent: Wed, February 24, 2010 7:58:36 AM Subject: Re: [iText-questions] Extracting a page as an Image - Original Message From: Mike Marchywka To: itext-questions@lists.sourceforge.net Sent: Wed, February 24, 2010 6:38:04 AM Subject: Re: [iText-questions] Extracting a page as an Image Date: Wed, 24 Feb 2010 12:04:12 +0100 From: jan.lendh...@vevention.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Extracting a page as an Image ... Just a short question, as I did not find anything on google or I used the wrong search phrases. ... I would like to extract the whole first page of a pdf file as an image (png or jpeg or so). ... It does seem I tried to use imagemagick, can't remember if that worked. It worked when I used it to convert an AcroForm to a JPEG image. You realize, of course, that this is not an iText question. That said, you could wrap a PDF in an Image object with iText. There are ample examples in the book to show you how this is done. These questions keep coming up and the tools should be of general interest to people trying to use itext. Best regards, Bill Segraves -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Extracting a page as an Image
Date: Wed, 24 Feb 2010 14:31:51 +0100 From: jmr...@gmail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Extracting a page as an Image Hello, I did this some years ago: Use pdftk (based on itext) to extract the page(s) and then convert it to image using ghostscript. pdftk is at http://www.accesspdf.com/pdftk/ ghostscript is at http://www.gnu.org/software/ghostscript/ Bruno et al I think you could save time with a page of tools somewhere on itext site similar to faq and an acrynum list ( THNTDWI- this has nothing to do with itext for example). We have several approaches, pdftk, foolabs xpdf, imagemagick( which I did just check works with convert x.pdf y.jpg ) and my own additions to open source render link I posted earlier ( I had to fish this out of an archive, I only had to add a couple of classes and chop off the gui stuff). Best regards Jose 2010/2/24 wasegraves - Original Message From: Mike Marchywka To: itext-questions@lists.sourceforge.net Sent: Wed, February 24, 2010 6:38:04 AM Subject: Re: [iText-questions] Extracting a page as an Image Date: Wed, 24 Feb 2010 12:04:12 +0100 From: jan.lendh...@vevention.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Extracting a page as an Image ... Just a short question, as I did not find anything on google or I used the wrong search phrases. ... I would like to extract the whole first page of a pdf file as an image (png or jpeg or so). ... It does seem I tried to use imagemagick, can't rememberif that worked. It worked when I used it to convert an AcroForm for a JPEG image. Best regards, Bill Segraves ... -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/201469230/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Retrieve Fonts from an existing PDF?
Date: Tue, 23 Feb 2010 09:26:10 +0100 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Retrieve Fonts from an existing PDF? Nirmal Fernando wrote: How can I retrieve Fonts of an existing PDF? Is there a way in iText to read the fonts embedded and replace them with a different type of font?? This isn't necessarily impossible, but in most cases it's very difficult and in some cases it's very unwise. For instance: what if only a subset of the font is present in the PDF? Then you'll never be able to retrieve the full font, and will you replace this subset with a full font or a corresponding subset? Do you know anything about encoding? Do you know anything about CMaps? Do you know anything about the differences in metrics? For instance: the width of the words Foobar Film Festival is 178.74 pt in Helvetica, but only 157.90 in Times-Roman for the same font size (12). In other words: if you replace Helvetica with Times-Roman, you'll screw up your entire layout. Remember that PDF is NOT a Word processing format; every glyph is positioned at a predictable location. If you want to change the font, you need to do the layout all over again. That is: recreate the PDF from scratch. I guess I'd just ask how hard it would have been for original author to include enough information, either with standard or an agreed upon private convention, to the OP to do any required re-layout? That is, what would be involved in creating the original pdf with enough logical structure to make it likely you could accomodate a font change ( or just extract the words that are often the only thing people care about, not fonts and columns etc)? This is just a variant of my recurring rant ( we want information not pictures in many cases to feed other computer programs ) to which you contribute a good point- fonts are complicated and human readability ads a lot of stuff. A human audience of course is perfectly valid and there is nothing wrong with using graphics to make it easier for the reader. However, having a short tractable alphabet, rather than say words composed of unordered and unbounded collection of things, or even 24-bit color pictures, is a big asset in organizing information- once you start adding stuff it can be confusing unless you take some care to separate the information from artwork. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/201469228/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] After rotation, images are getting displced in iText PDF
Date: Mon, 22 Feb 2010 20:33:10 -0800 From: ra...@vinfotech.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] After rotation, images are getting displced in iText PDF Hi, It's very urgent. We need help in this. Looking towards positive support. If this is that urgent, can't you just kluge a solution by translating the output or is this more involved than a one-line call? I thought the conclusion was something to that effect being the best itext solution. Designing an API is always a tradeoff, and it is possible there is no center parameter anywhere Am I right if I say that this is more like a Math question (algebra) than an iText question? -- This answer is provided by 1T3XT BVBA rhul_rk wrote: Yes, you are right, there are lots of calculation and algebra in this. But each and every object plots in the PDF are exist in an imaginary (hidden) rectangle. When we plot an image (without rotation) it is plotting at the correct location. but when we rotate an object, it gets displaced in PDF. Because as stated earlier each object is exist in the imaginary (hidden) rectangle, which is having Plane and Absolute height Width. On rotation, absolute height and width gets changed but Plane height width remains the same. These are the height and width of imaginary (hidden) rectangle. Say, we are rotating the object in iTextSharp with the help of Image.RotationDegree=45. It rotates the object to 45 degree from the Bottom Left Corner of the imaginary (hidden) rectangle instead of actual image's Bottom Left Corner. Just want to ask you that is there any mechanism in iTextSharp to rotate any object from center point instead of bottom left corner? Thanks for you quick reply. Looking for you support. Thanks, Rahul 1T3XT info wrote: rhul_rk wrote: One more thing here we would like to mention that, in flex application items are rotating from center point and in the iTextSharp PDF items are rotating from Lower Bottom corner. Is their any sort of mechanism in iTeshSharp to rotate the image from center point with mention Degree of rotation. Hope we are clear with our issue. Am I right if I say that this is more like a Math question (algebra) than an iText question? -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- View this message in context: http://old.nabble.com/After-rotation%2C-images-are-getting-displced-in-iText-PDF-tp27683306p27698368.html Sent from the iText - General mailing list archive at Nabble.com. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469227/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions:
Re: [iText-questions] After rotation, images are getting displced in iText PDF
it looks like it was in your response, hotmail has been a big problem with Executing text that happens to look like html even with text mode on, but in this case it seems it made it out since it made it back, From: ra...@vinfotech.com Mike, thanks for trying to post the answer. But I get empty reply from yours. Might be accidentally it happen. I will be glad to receive the answer. Thanks!!! Rahul Khadikar -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Tuesday, February 23, 2010 4:42 PM It's very urgent. We need help in this. Looking towards positive support. If this is that urgent, can't you just kluge a solution by translating the output or is this more involved than a one-line call? I thought the conclusion was something to that effect being the best itext solution. Designing an API is always a tradeoff, and it is possible there is no center parameter anywhere Am I right if I say that this is more like a Math question (algebra) than an iText question? -- This answer is provided by 1T3XT BVBA _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/201469228/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Using Images extracted from a pdf
Date: Tue, 23 Feb 2010 06:52:54 -0800 From: fernandogomes...@hotmail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Using Images extracted from a pdf can anyone help-me one more time.. i dont know what i do .. I need to get the image bytes, now decoded... probably the open source pdf renderer would answer your questions and provide more context. I seem to recall it was pretty easy to modify to extract page images in your favorite format, probably in process of rendering the included images are extracted etc. String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString(); String filter = pdfStrem.get(PdfName.FILTER).toString(); int bits = Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString()); int width = Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString()); int height = Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString()); PdfDictionary param = (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS); int colors = Integer.valueOf(param.get(PdfName.COLORS).toString()); int predictor = Integer.valueOf(param.get(PdfName.PREDICTOR).toString()); int colums = Integer.valueOf(param.get(PdfName.COLUMNS).toString()); if(filter.equals(/FlateDecode)) { byte[] bytesDecod = PdfReader.FlateDecode(bytes); these are all the information that I can withdraw PDF I have to do to create my image in general .. I'm trying to do, or learn, but this hard, all my attempts have failed. ty Fernando Gomes wrote: Sirs, really sorry for duplicating, can delete other topics ? so sorry ..:blush: very thkx for help.. and so good fast help .. i will estudy more .. Leonard Rosenthol-3 wrote: You are assuming that PDF maintains the PNG nature of the image - that is NOT the case. PDF only supports two kinds of images JPEG (which is why this works) and raw bitmaps (aka an array of bits). So in your case, with the PNG, it is transcoded into the latter case and so if you want it back you will need to reverse the process on your end. for this response in other same email :blush: quote of 1T3XT info below .. really thanks. I must have seen the realance the chapter that you mentioned, I will read again and very carefully. My English is very weak, and it is very difficult to read. you are very funny, I laughed a lot. I know I deserved the scolding. Really thanks for your help. I will test and then come back to post the result. Thank you! 1T3XT info wrote: Fernando Henrique Gomes wrote: the problem is when I insert an image in PNG format and then try to get the same... OK, we're talking about a PNG. If you've read chapter 10 of the 2nd edition of iText in Action, you know that PNGs are transformed into zipped pixels. If you didn't know, you should read the book! on here i try to take that image... [code] int XrefIndex =((PRIndirectReference)obj).getNumber(); PdfObject pdfObj = pdf.getPdfObject(XrefIndex); PdfStream pdfStrem = (PdfStream)pdfObj; byte[] bytes = PdfReader.getStreamBytesRaw((PRStream)pdfStrem); if ((bytes != null)) { String fileName = Image_P+pageNumber+_; File file = new File(fileName); FileOutputStream fw = new FileOutputStream(file); fw.write(bytes); fw.flush(); fw.close(); BufferedImage img2 = ImageIO.read(file); com.lowagie.text.Image img = com.lowagie.text.Image.getInstance(file.toURL()); } [/code] img2 returned a null Of course, why do you think that would work??? in line of img .. has a Excpetion Image_P1_ is not a recognized imageformat Of course, you're sending iText a bunch of pixels, but: what are the dimensions of the image, how many bits are there per component? when i try to do : [code] Image image = Toolkit.getDefaultToolkit().createImage(bytes); [code] and before create an image from this image getting the width and height from my PdfStream (create a buffered and draw the image) when i serialize on a file and visualize this.. this image in a fucking black picture .. all black -.- It's because you don't have a fucking clue about what you're doing :P Hehe, I was waiting for an occasion to use the F* word on the list. Thanks! if i use JPEG encode for my images.. all the 3 solution i have .. its ok.. have effects.. Well, that's because iText stores JPEGs literally as a JPEG without changing any of the bytes. If you look inside, you'll see that the filter is DCTDecode (Discrete Cosine Transform). i can vizualize my images how to i create then .. perfect.. but if i change de JPEG ... for any other encode.. thats not have efect .. No idea what you're saying here, but you also need to study images. can any help-me plz ? This example doesn't involve iText, but explains what you're missing. Let's create an image byte per byte: byte b[] = new byte[256 * 3]; for (int i = 0; i 256; i++) { b[i * 3] = (byte) (255 - i); b[i * 3 + 1] = (byte) (255 - i); b[i * 3 + 2] = (byte) i; }
Re: [iText-questions] Using Images extracted from a pdf
You can always use the command line tool in pdf toolkit or xpf, I can't remember which but there is something like pdf2image similar to pdf2text to extract text. Date: Tue, 23 Feb 2010 12:43:28 -0800 From: fernandogomes...@hotmail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Using Images extracted from a pdf I'm going crazy with it. as you can see, I never manipulated images as low level. and do not have much sense of how things work. I am searching for a days for end my solution. and I'm already getting stressed. i going on test methods .. i try to do.. and before try by another choice.. -.- can you give me some more assistance on how I can turn this array of bytes back into an image? could have just one class of api that made it not? : P Pdfimages buf = new pdfimages (myRawImageByteArray); buf.getAsBufferedImage (); : P if you say you can not help me all right, but I can indicate a content in which I can rely on to get this done? thanks. Leonard Rosenthol-3 wrote: The image is decompressed and then injected into the PDF. Same with EVERY TYPE of image EXCEPT JPEG. -Original Message- From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] Sent: Tuesday, February 23, 2010 3:21 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Using Images extracted from a pdf ty .. I have a question. when I insert an image that is not jpeg what exactly happens with this? say that it is in PNG it is decompressed to be injected into PDF? or she keeps your PNG format, but the bytes are encoded with the FlateEncode .. a matter of finding the filter and decode do I get it. and if the image is uncompressed before being inserted to PDF, how do I know which type of encode the image? Leonard Rosenthol-3 wrote: Bits per pixel is the BitsPerComponent value in the image object Pixels per line (POR LINHA) is _NOT_ Width * bits. It's Width * NumComponents, where NumComponents is based on the colorspace in question (eg. RGB == 3, CMYK == 4). -Original Message- From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] Sent: Tuesday, February 23, 2010 2:00 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Using Images extracted from a pdf public static BufferedImage createBufferedImageFromRawBytes(byte[] bytes,int width, int height, int bits) throws BadElementException, MalformedURLException, IOException { com.lowagie.text.Image img = com.lowagie.text.Image.getInstance(bytes); DataBuffer db = new DataBufferByte (img.getRawData(), img.getRawData().length); WritableRaster raster = Raster.createPackedRaster(db, //DATA BUFFER width, //LARGURA height, //ALTURA width*bits, //LARGURA * BITS POR PIXEL = PIXEL POR LINHA -scanlineStride // bits, //BITS POR PIXEL -pixelStride new int [] {bits}, null); ColorSpace cs = ColorSpace.getInstance (img.getColorspace()); ColorModel cm = new ComponentColorModel(cs, false, false, Transparency.OPAQUE, db.getDataType()); BufferedImage bi = new BufferedImage (cm, raster, false, null); return null; } this code is up to where I could get, but there are variables that I know of to generate bufferedImage, please someone help me see if I'm on track. If I write something wrong. Fernando Gomes wrote: can anyone help-me one more time.. i dont know what i do .. I need to get the image bytes, now decoded... String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString(); String filter = pdfStrem.get(PdfName.FILTER).toString(); int bits = Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString()); int width = Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString()); int height = Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString()); PdfDictionary param = (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS); int colors = Integer.valueOf(param.get(PdfName.COLORS).toString()); int predictor = Integer.valueOf(param.get(PdfName.PREDICTOR).toString()); int colums = Integer.valueOf(param.get(PdfName.COLUMNS).toString()); if(filter.equals(/FlateDecode)) { byte[] bytesDecod = PdfReader.FlateDecode(bytes); these are all the information that I can withdraw PDF I have to do to create my image in general .. I'm trying to do, or learn, but this hard, all my attempts have failed. ty Fernando Gomes wrote: Sirs, really sorry for duplicating, can delete other topics ? so sorry ..:blush: very thkx for help.. and so good fast help .. i will estudy more .. Leonard Rosenthol-3 wrote: You are assuming that PDF maintains the PNG nature of the image - that is NOT the case. PDF only supports two kinds of images JPEG (which is why this works) and raw bitmaps (aka an array of bits). So in your case, with the PNG, it is transcoded into the latter case and so if you want it back you will need to reverse the process on your end.
Re: [iText-questions] Writing to ServletOutputStream
Date: Mon, 22 Feb 2010 11:45:43 +0100 From: To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Writing to ServletOutputStream Hi, Please note that I postet links to pastie.org for better readability of the code snippets. I am using iText to print graphs produced with the JUNG Framework to pdfs. To achieve this I have the following code: http://www.pastie.org/private/rsy6wzneedpvo4dai3vcgw Writing the graphics Object to the pdf is done by the following code: http://www.pastie.org/private/zgwocpvjih16j2cmcdmza The produced ByteArrayOutputStream is used to save the content to a file (works great - I get a wonderful pdf): http://www.pastie.org/private/imxi9cmdrzowop9ivxgnba The reason why I am generating a ByteArrayOutputStream is that I additionally want to write the created pdf content to a ServletOutputStream: http://www.pastie.org/private/r4h2lad26xbwjokoh0zbq unfortunately the only thing I get is a PDF document in the desired dimension but blank - no content :( I am using almost the same code for writing text content to a ServletOutputStream and I do get the content - so I think the code of the response is ok. Is there a problem of writing ByteArrayOutputStream content containing iText data to ServletOutputStreams? It is really weird that everything works when I write the ByteArrayOutputStream content to a FileOutputStream and don't get anything when I write it to the ServletOutputStream :( I didn't hit the links and I'm not sure what you mean by blank but do you set the server's content type to something telling your browser it is pdf? If you hit your server with something more diagnostic than artistic, like wget instead of IE, you can at least see what it thinks is going on- is the byte count right etc. You may even be able to do some diffs and determine if there is truncation or corruption etc. Also check the servlet debugging information which you hopefully generate :) It would be great if you could take a look at my code. Thank you in advance! Sebastian Furth _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/201469230/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Writing to ServletOutputStream
Date: Mon, 22 Feb 2010 14:56:45 +0100 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Writing to ServletOutputStream Sebastian Furth wrote: Something happens during the request because it returns a pdf in the dimension set in the code - but there is absolutely no content (blank). Sounds like the blank page problem described in the book (1st and 2nd edition). This happens if you shave the upper bit from every byte. The PDF structure is preserved, and as a result a viewer can show you all the pages of the PDF, the bookmarks, etc... But all binary data, for instance the page content stream, is made corrupt (of course: you've thrown away 1/8 of the information). If that's what's happening in your case, you have a configuration error somewhere. cygwin has an octal dump utility(od iirc) , first few lines of output would at least let you know if that is the problem. Offhand I don't know who would assume you have text but again you'd have to think a wrong content type somewhere. ByteArray of course is supposed to be just that, no text assumption but if someone manipulates those there could be sign issues, all kinds of things could happen. I just made a funny post on cygwin-talk list about that bit being used for parity only LOL. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Writing to ServletOutputStream
Date: Mon, 22 Feb 2010 15:14:55 +0100 From: sebastian.fu...@googlemail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Writing to ServletOutputStream Thanks for your reply! Is it possible that the ServletOutputStream shaves the upper bit from every byte? I have a method which returns a ByteArrayOutputStream containing the pdf data. If I delegate this OutputStream to a FileOutputStream everything is ok but if I use the ServletOutputStream there is no content in the pdf. If the ServletOutputStream is responsible for this do you have a idea how I can prevent it from doing this? javadocs claim it is for binary data but maybe you are using wrong method or manipulating bytes as char or have subclassed, I dunno, http://www.google.com/#hl=ensafe=offq=site%3Asun.com+ServletOutputStreamaq=faqi=oq=fp=d95f0d161f018361 Thank you in advance! Best regards. Sebastian Furth 2010/2/22 1T3XT info Sebastian Furth wrote: Something happens during the request because it returns a pdf in the dimension set in the code - but there is absolutely no content (blank). Sounds like the blank page problem described in the book (1st and 2nd edition). This happens if you shave the upper bit from every byte. The PDF structure is preserved, and as a result a viewer can show you all the pages of the PDF, the bookmarks, etc... But all binary data, for instance the page content stream, is made corrupt (of course: you've thrown away 1/8 of the information). If that's what's happening in your case, you have a configuration error somewhere. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Writing to ServletOutputStream
LOL, I don't imagine this will help, but the high bit is probably being set to zero. Whoever is creating all those content attach/disposition things probably doesn't know it is not text. marchywka:/home/marchywka# od -ax Desktop/Car-Diagnosis_Visualization.pdf | sed -e 's/ /\n/g' | grep ^$ | cut -c 1 | sort | uniq -c 152 0 146 1 273 2 1537 3 202 4 194 5 327 6 215 7 marchywka:/home/marchywka# Date: Mon, 22 Feb 2010 15:30:21 +0100 From: sebastian.fu...@googlemail.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Writing to ServletOutputStream Of course it were good hints - the problem is me :) OK I try to explain my problem: I have created a JSP-Servlet which shall return a PDF Document on request. //Get the file content ByteArrayOutputStream bstream = de.d3web.empiricalTesting.caseVisualization.jung.JUNGCaseVisualizer.getInstance().getByteArrayOutputStream(t.getRepository()); //Response response.setContentType(application/pdf); response.setHeader(Content-Disposition, attachment;filename=\+filename+\); response.setContentLength(bstream.size()); //Write the data from the ByteArray to the ServletOutputStream of the response bstream.writeTo(response.getOutputStream()); response.flushBuffer(); The pdf document is created by iText and should contain a graph (Graphics2D Object). init(cases); int w = vv.getGraphLayout().getSize().width; int h = vv.getGraphLayout().getSize().height; ByteArrayOutputStream bstream = new ByteArrayOutputStream(); Document document = new Document(); try { PdfWriter writer = PdfWriter.getInstance(document, bstream); document.setPageSize(new Rectangle(w, h)); document.open(); PdfContentByte cb = writer.getDirectContent(); PdfTemplate tp = cb.createTemplate(w, h); Graphics2D g2 = tp.createGraphics(w, h); paintGraph(g2); g2.dispose(); tp.sanityCheck(); cb.addTemplate(tp, 0, 0); cb.sanityCheck(); document.close(); } catch (DocumentException e) { Logger.getLogger(this.getClass().getName()) .warning(Error while writing to file. The file was not created. + e.getMessage()); } return bstream; If I delegate the ByteArrayOutputStream created in the method posted above to a FileOutputStream the pdf has the desired content - but If I delegate it to a ServletOutputStream the content (the Graphics2D Object) is missing. I attached the pdf where the Graphics is missing. Maybe you can get some information out of it. Thank you in advance! Best regards Sebastian Furth 2010/2/22 1T3XT info Sebastian Furth wrote: Once again, thanks for your reply. Unfortunately I think I don't have enough experience to understand your hints :) It were good hints though; I thought everybody knew wget. If possible, can you explain your problem as good as Mike explained how to use wget? For instance: save the PDF on your local system and open it using a text editor such as Notepad++, Wordpad,... What do you see? -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] RandomAccessFileOrArray file load in memory
Date: Mon, 15 Feb 2010 07:07:50 -0800 From: To: itext-questions@lists.sourceforge.net Subject: [iText-questions] RandomAccessFileOrArray file load in memory Hello, i'm confront to big problem of out.memory error on server when i have a lot of user who want get a pdf file from tiff file. This may be considered OT for itext, and maybe someone has a better answer, but after all my comments about resource usage re pdf machinations- short answer see if anyone at sun.com has similar issues and solutions for other server things and try subbing into itext for your own needs, unless Bruno has canned implementation alternatives. For requests that just take a long time, you may have to change your paradigm and notify user later via email or something when result is done. These are not specific to itext or pdf. I use Itext 1.2.7 Is it possible to Override the RandomAccessFileOrArray for replace the byte arrayIn[] by a temporary file ? Generally memory management in java is quite limited and long before you run out of memory you would want to do things like maximize low level cache hits etc. However, there may be something on sun.com as this is likely to be a common issue when you scale java apps ( I've never bothered to look mysef but it woldn't just be about itext) and you have the source code so you can take alt approaches. Code is never really platform independent and implementation details make of break real-world utility ( hence issues with pdf resource needs and benefits). Also note if all the users are translating the same image, in-memory caching of single objects not duplicated hundreds of times, can be a big savings. You need a sharing mechanism in this case. A scalable itext or something like that would probably be a commercial product :) Assuming you have zero virtual memory right now , this is just going to slow things down even more ( preusmably your current out of memory condiution has alrady been addressed with increased heap size to the point of doing a lot of VM thrashing ) and it could get to the point where each requests takes forever as the whole system thrashes between requests ( you can probably write a simple equation to determine the number of executing requests given the arrival rate and processing time with proc time increasing with number of active requests). You might just be better off limiting the number of active requests and queing the rest and notify user when done if currently you are trying to return a complete pdf to user via the requesting http connection. or use temporary file when the file is more than 5 ko for example. Thank you, -- View this message in context: http://old.nabble.com/RandomAccessFileOrArray-file-load-in-memory-tp27595180p27595180.html Sent from the iText - General mailing list archive at Nabble.com. -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/201469230/direct/01/ -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Unable To Set Checkboxes With Complicated Names
From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Sun, 14 Feb 2010 07:03:21 -0800 Subject: Re: [iText-questions] Unable To Set Checkboxes With Complicated Names Actually, I would think you would LIKE the fact that the IRS switched to XFA-based forms! This means that it's all XML-based, but inside of the simple PDF wrapper. So now you can quite easily get all the information you need - layout, data, etc. No muss, no fuss. Yes, PDF forms support submission of data in a variety of formats - HTML, FDF, XFDF, XML (custom grammars) and PDF. It's up to the author of the form to choose which they want, based on the system they are integrating with. The IRS DOES allow submission of just data - that's how vendors such as Quicken, TurboTax, etc. do their electronic filings. However, you need to establish a trusted relationship with the IRS in order to be able to do this, due to reasonable concerns about DoS attacks, etc. Yeah well right now I want to pay my taxes and not have to either pay anyone or type numbers into something from which I can not get them back out. It wasn't entirely a complaint although I'm glad you responded, just to reiterate that some customers do use pdf as a data sink. I haven't looked recently since I had a hard time extacting text from the instructions a while back and I could not find a free way to send in the forms. I'm not sure why you need any more trust posting XML than writing a script to browse their website ( are they really concerned about DOS attacks LOL?) or doing a credit card transaction over the internet- certainly there are some issues with user confusion but still this seems to be restricted artificially beyond what similar transactions do for security It would be just as easy for them to send you back a copy for verififcation ( is this what you really meant?) like most transcations do rather than push a front end that happens to look like paper and then make approved e-paper pushers. Quite simply, as you list the approved vendors, this looks like a business decision as much as anything. FYI: The SEC supports submission in a variety of formats. Generally they are dealing with structured submissions designed for data extraction. Personally I've never sent them anything, I have no idea what the front ends look like but I can go get plain text and some limited XBRL filings. It would be nice to go to the IRS and get back similar results for myself for example, not just a bunch of pixels and I have no idea what they may be thinking here for the future. Leonard -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Saturday, February 13, 2010 11:01 AM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Unable To Set Checkboxes With Complicated Names This rant does eventually relate back to PDF architecture, but you have to make it through the whole post LOL, The IRS forms are one of the things that brought me here and now GA is making it hard to get paper forms any more. Do you know if there is s secure place you can just post an XML file to the IRS instead of all this formatting junk left over from the days of paper? The data itself is quite simple and they have to separate it anyway, it would be easier just to have the home user press a submit button that extracts the form data and have them post that back to the IRS? So I guess the question is, isn't there a way to design a pdf form such that you only submit the DATA back to the author, in this case the IRS) instead of all the format junk that they already have? Presumably this would let anyone submit tax data easily using the tool of their choice even if free with no loss of security. I can type in the few kb of numbers using (free) notepad and then post using wget. Most people of course wouldn't do that but they probably want to import and export their numbers.. If you design your docs to allow the numbers to go in and out anyway, why can;'t you just send the numbers back to the people who wanted them in the first place.? For example, the SEC submissions of similar data in something called XBRL format which is purely machine readable and no indication of any interest in making copies of paper or stone tablets. While I guess you can call pdf a standard it inherently interlinks the pictures with the numbers of real interest making it hard to do some simple things. _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/201469230/direct/01/ -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev
Re: [iText-questions] Unable To Set Checkboxes With Complicated Names
To: itext-questions@lists.sourceforge.net From: Date: Thu, 11 Feb 2010 15:09:24 + Subject: Re: [iText-questions] Unable To Set Checkboxes With Complicated Names 1T3XT info 1t3xt.info writes: Jun Zuo wrote: Hi! I hope that someone can tell me how to set checkboxes with names like: topmostSubform[0].Page1[0].Line6cTable[0].#subform[1].c1_07[0] The index of subform is 1 not 0. Are you sure you are trying to fill a static XFA form. It seems to me, you're working with a dynamic form. (But I could be wrong.) This is the 2009 Form 1040 from the IRS. I think it is a dynamic form! What is the trick for a dynamic form? This rant does eventually relate back to PDF architecture, but you have to make it through the whole post LOL, The IRS forms are one of the things that brought me here and now GA is making it hard to get paper forms any more. Do you know if there is s secure place you can just post an XML file to the IRS instead of all this formatting junk left over from the days of paper? The data itself is quite simple and they have to separate it anyway, it would be easier just to have the home user press a submit button that extracts the form data and have them post that back to the IRS? So I guess the question is, isn't there a way to design a pdf form such that you only submit the DATA back to the author, in this case the IRS) instead of all the format junk that they already have? Presumably this would let anyone submit tax data easily using the tool of their choice even if free with no loss of security. I can type in the few kb of numbers using (free) notepad and then post using wget. Most people of course wouldn't do that but they probably want to import and export their numbers.. If you design your docs to allow the numbers to go in and out anyway, why can;'t you just send the numbers back to the people who wanted them in the first place.? For example, the SEC submissions of similar data in something called XBRL format which is purely machine readable and no indication of any interest in making copies of paper or stone tablets. While I guess you can call pdf a standard it inherently interlinks the pictures with the numbers of real interest making it hard to do some simple things. _ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/201469230/direct/01/ -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] FW: Mail to iTextSharp
Date: Thu, 11 Feb 2010 08:25:24 +0100 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] FW: Mail to iTextSharp suphat phuenpha wrote: If I want to convert a color PDF files to black and white or gray. What can I do that ? You can't. Isn't this just a matter of finding all the color tables or models and changing them to grey? When you say can't you mean there is nothing in the API that does this in a few lines or it is fundamentally impossible ( without say rendering to pixels and makes new PDF pages out of color-modified pixels)? -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/201469228/direct/01/ -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] pdf does not receive image from a servlet
To: itext-questions@lists.sourceforge.net From: djayaward...@westpac.com.au Date: Tue, 9 Feb 2010 22:30:36 + Subject: Re: [iText-questions] pdf does not receive image from a servlet Paulo Soares glintt.com writes: Forget about iText for now. Can you get the image into a byte array? Once you get there Image.getInstance() will always work. Paulo Thanks Paul/Mark, Byte array did work. I got the picture into byte array and passed it to Image.getInstance(); After all of this, can you tell us what the problem was? Are you saying the byte[] all worked but not the one where you pass URL or did you have to change anything? Thanks Donald -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469227/direct/01/ -- SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] pdf does not receive image from a servlet
Date: Mon, 8 Feb 2010 07:00:43 +0100 From: br...@lowagie.com To: djayaward...@westpac.com.au; itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] pdf does not receive image from a servlet Donald Jayawardena wrote: Hi Bruno, Sorry for confusion. Is that allright if I put the problem down here before I put it in iText? Not really, but your question is more clear now. 2. I am having problems getting the image into PDF from the createjpg servlet. I am running the following command to get the image into PDF from createjpg servlet: com.lowagie.text.Image iRiskNo = com.lowagie.text.Image.getInstance(new URL(http://accord-wf-dev.unix.srv.westpac.com.au:9704ccord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg;)); Is this syntax right or just copy/paste error? There is no / after the port number and you have apparntly a duplicated string . In any case, break this up into two steps and use an alt itext method that takes something you can examine like a byte[]. Also, that host name is not accessible to me. wget -O ~/xxx.jpg -S -v http://accord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg; --2010-02-08 05:50:01-- http://accord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg Resolving accord-wf-dev.unix.srv.westpac.com.au... failed: Name or service not known. wget: unable to resolve host address `accord-wf-dev.unix.srv.westpac.com.au' wget -O ~/xxx.jpg -S -v http://accord-wf-dev.unix.srv.westpac.com.au:9704/ccord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg; --2010-02-08 05:47:58-- http://accord-wf-dev.unix.srv.westpac.com.au:9704/ccord-wf-dev.unix.srv.westpac.com.au:9704/createjpg/createjpg Resolving accord-wf-dev.unix.srv.westpac.com.au... failed: Name or service not known. wget: unable to resolve host address `accord-wf-dev.unix.srv.westpac.com.au' If thiis is a private name it doesn't help us. createjpg is called from the server. During the debug, when this statement being executed, I can see the output is showing: SEVERE: -- returning Frame NULL SEVERE: BaseDialog: owner frame is a java.awt.Frame THIS IS NOT, I REPEAT, THIS IS NOT AN iText ERROR MESSAGE!!! Please understand that you don't have to look at iText when looking for the problem. LOL, have you ever seen similarly labelled output from other servlets? Again, hunt around the sun site or grep your servlet engine docs or your own servlet source code. How do you handle errors? I'm sill thinking this is an attempt to popup a server side dialog box with no gui available to jvm but it is just a guess. From the logs, I could see that the calling to createjpg servlet happened successfully. createjpg servlet produces the image as follows: JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(out); JPEGEncodeParam param = encoder.getDefaultJPEGEncodeParam(bufferedImage); param.setQuality(1.0f, false); encoder.setJPEGEncodeParam(param); encoder.encode(bufferedImage); Hope this does not confuse you. You should understand that calling a servlet from a client IS NOT THE SAME as calling a servlet from the server. A servlet is just a java class- you can't type that into a browser address bar but web server can invoke an instance thereof when the right url comes up. However, there is nothing to prevent you from calling one yourself in other server side java code if you can load the class def somehow. It works for the HTML, because the browser is calling the servlet through HTTP on the internet/intranet. Actually, the presence of NAT's can be confusing if you use the same IP or even hostname and don't account for this. Code making reqs from client may need a different host than server, and server needn't know who 127.0.0.1 is or know its own name ( ther could be many virtual hosts). That doesn't mean your server permissions allow you to call the servlet. This is really not an iText problem. It does seem to be using your time :) I'm forwarding this to the mailing list so that others can confirm. best regards, Bruno _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/201469228/direct/01/ -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask
Re: [iText-questions] Itext doesn´t work with Barc ode128 Font
From: psoa...@glintt.com To: itext-questions@lists.sourceforge.net Date: Sun, 7 Feb 2010 12:59:38 + Subject: Re: [iText-questions] Itext doesn´t work with Barcode128 Font This was already explained before: the font has a flag that says that it's a symbolic font and symbolic fonts can only have 256 character. iText could ignore that flag but it would then fail with really symbolic fonts because symbolic fonts expect a particular encoding. The font may work in Word because the encoding problems in an interactive application are not as relevant as in a PDF. This situation may have to be addressed by iText, after all it already fixes broken fonts and PDFs, but I've no idea if/when that will happen. At this point, it would be common to mention for no particular purpose that the source code is available and you can't predict when any of the interested users will check-in a fix. LOL. It sounds like you are just suggesting the code needs a // somewhere- often I get open source stuff and make private hardcoded modifications that would be of no use to anyone else but if it is a matter of adding a method that may be easy and reusable. Paulo - Original Message - From: Claudia Murialdo To: Post all your questions about iText here Sent: Sunday, February 07, 2010 12:24 AM Subject: Re: [iText-questions] Itext doesn´t work with Barcode128 Font 1) Yes it is ok. 2) But, Barcode 128 it just a font, isn't it?. So if I want to print only the character Š, It should be possible. Am I right?. Using itext, this character is not printed, however it is a valid character which is part of the table character of Barcode 128, i can see it in Character map utility of Window when I choose this font. This char, actually, a few characters, the last ones of the valid table character of Barcode128, are ignored when I use itext to print them. I tried the built barcode system of itext, generating images and the generated image is perfect for the original text (3309072963568700011355003017381600349594), but I need to do it using the font beacuse it is part of a generic program, and the program receives the coded text (they need to choose exactly what symbology to use, A, B, or C, they need that). Could you download the barcode I uploaded at http://www.usaupload.net/d/5qpza92olsd?. So you can see the problem. Thank you. Claudia. On Thu, Feb 4, 2010 at 4:45 PM, Mark Storer wrote: 1) Does your string contain the start/stop characters checksum already? If not, you won't see them. 2) Just because it's a valid string doesn't mean its a valid Barcode128 string. Each symbology has its own requirements. The online barcode generator at http://www.morovia.com/free-online-barcode-generator/ didn't seem to like your input string. Sseveral missing character characters appear in the text below the bars, and there's no telling what they're represented as in the graphic portion. iText has its own built in barcode system, I suggest giving it a shot. --Mark Storer Senior Software Engineer Cardiff.com #include typedef std::Disclaimer DisCard; -Original Message- From: Claudia Murialdo [mailto:cmuria...@gmail.com] Sent: Thursday, February 04, 2010 6:46 AM To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Itext doesn´t work with Barcode128 Font I'm using itext to generate a PDF document using a true type font for BarCode128. The problem is that the start and stop characters are not printed. The text is: ‰A)'=_Xwè!-Wèèè1F0èBÀ~UŠ It corresponds to the string 3309072963568700011355003017381600349594 converted to Barcode 128. It is a valid string since I see it OK in Word and and browser and any several kind of editors. Why I cant see it ok the barcode generated using itext?. I uploaded the barcode 128 here http://www.usaupload.net/d/5qpza92olsd This is the code: Rectangle pageSize = new Rectangle(780, 525); Document document = new Document(pageSize); PdfWriter writer = PdfWriter.GetInstance(document, File.OpenWrite(Test.pdf)); document.Open(); PdfContentByte cb = writer.DirectContent; BaseFont bf = BaseFont.CreateFont(@C:\WINDOWS\Fonts\bcode128.ttf, BaseFont.IDENTITY_H, BaseFont.EMBEDDED); cb.SetFontAndSize(bf, 50); cb.BeginText(); cb.ShowTextAligned(Element.ALIGN_CENTER, ‰A)'=_Xwè!-Wèèè1F0èBÀ~UŠ, 200, 400, 0f); cb.EndText(); document.Close(); Regards, Claudia. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list
Re: [iText-questions] Does not get an image from a servelt
Date: Fri, 5 Feb 2010 18:11:33 +0100 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Does not get an image from a servelt Donald Jayawardena wrote: Mike Marchywka hotmail.com writes: The statement com.lowagie.text.Image iRiskNo = com.lowagie.text.Image.getInstance (http://localhost:8080/createjpg/createjpg;); gives an error as follows: SEVERE: owner frame is a java.awt.Frame The creatrejpg servlet returns an image of awt frame. Does this means that com.lowagie.text.Image.getInstance() can not handle awt frames? ??? What do you mean by this question? The more mails you send the less people understand what you're talking about. If you want an answer, you'll have to stop confusing us, and start giving us information about the problem that makes sense. Having not looked at the source or your error handling approach, I finally decided this could be something you emit ( for lack of a more precise word) from some where in itext to wherever the OP can paste text. I take it you are not aware of any such possiblity. Maybe you could just say that much or explain where this error comes from itext :) The consensus seems to be, judging from a few comment and much silience, no one knows where this message comes from but the servlet engine may be a reasonable candidate. I had earlier speculated that someone could have turned an IO exception into an attempt to popup a dialog box in the server JVM with no gui. IIRC, it was claimed that this worked in standalone or some other test environment. OP may have created error handler based on immediate human feedback- chernobyl effect I mentioned earlier was the process of turning a simple problem into a much larger one through a series of questionable responses to each exceptional event. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/201469228/direct/01/ -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Does not get an image from a servelt
Date: Thu, 4 Feb 2010 08:01:36 +0100 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Does not get an image from a servelt Donald Jayawardena wrote: Thanks for your solution. Once the main servlet(riskmap) called image servlet (createjpg), createjpg writes to a log file after it generating the image. I can see riskmap call to createjpg is working fine. But the virtual image does not come to the parent servlet(riskmap). The parent (riskmap) servlet does not give any error. How can I trap the stack trace without it returing any exception? You can't. The main question is: can the parent servlet(riskmap) be rewritten so that iText is not involved? Can it retrieve the byte[] with the image? I guess not, because iText doesn't do anything special. It just creates an URL object and calls openStream() to get the I guess this always creates problems when designing an API. You expose something like public Widget makeWidget(HighRiskErrorProneThing x) throws Throwable or do you trap exceptions internally and return null and let call sort through the parameter in steps? But, in any case, breaking it up into higher-rish and lower-risk steps would be helpful. Any IO or user interaction is high-risk as anything can happen. Presumably once you have your bytes itext will behave predictably just based on the validity of the byte array. If you get back null or an exception you can pass byte array around and do various checks. btw, was that thing trying to popup a dialog box on the server? bytes of the Image... Although it does this multiple times; maybe you should first get the image bytes and then feed them to iText. Nevertheless: the error you're mentioning is not related to the problem. That's very strange. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] get image into pdf
Date: Wed, 3 Feb 2010 09:12:10 +0100 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] get image into pdf Donald Jayawardena wrote: Hi, I bought iText in Action but could not find the answer there. And the answer won't be in the second edition either, because getting an image in a Servlet is not an iText related problem. LOL I can't remember why exactly but many years ago this was a recurring issue on a servlet list ( how do I do some complicated thing unrelated to servlets... using a servlet?). My problem is pdf documnet is trying to receive an image from a servlet, but it receives null. Can you get it working in a standalone example? The netBeans (with Tomcat) gives the following error: *SEVERE: -- returning Frame NULL* *SEVERE: BaseDialog: owner frame is a java.awt.Frame* BaseDialog? Frame? Definitely not an iText problem. I'm not even sure what this has to do with a servlet. Is your servlet supposed to popup a dialog box? When I run the servlet alone, it produces an image. What does this mean? you run it locally of call it from a standalone app? So you can deploy the servlet and it works when you use: The statements I use: Image iRiskNo = null; iRiskNo = Image.getInstance(new URL(http://localhost:8080/createjpg/createjpg;)); I don't know what all you are doing or where you are testing but is localhost changing at any time? Then it's definitely not an iText problem. Please advise me what to do. If it works when you deploy it on Tomcat and use it with a normal browser, but it doesn't work when you deploy it in NetBeans, then the problem is a NetBeans problem. (Some frame or dialog that can't be accessed?) Servlets aren't supposed to really do this ( at least not many years ago ). They are suppoed to live during the lifetime of a connection to some thing and not in themselves interact with humans. _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/201469228/direct/01/ -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Does not get an image from a servelt
I thnk it would help if you could get a stack trace from an exception that involves itext. User some toher utility to verify that the url you have is valid etc. Develop some debug strategy for your servlets that catches everything and logs to a file somewhere. I think when you posted this earlier no one knew what to make of your error report. For the pdf page in riskmap servlet, I have used getInstance call as below: Image iRiskNo = null; iRiskNo = com.lowagie.text.Image.getInstance(http://localhost:8080/createjpg/createjpg;); When it runs the above statement, it gives the following error SEVERE: -- returning Frame NULL SEVERE: BaseDialog: owner frame is a java.awt.Frame Who is it that you mention and what is a BaseDialog? That call appears to return an Image, not something that obviously relates to your text above even from Image.toString(). Is Lowagie saying this is severe? If your call has to throw an IOException or something, what does your code do? The error you are printing could be anything- someone could be trying to put up a dialog box and of course the servlet probably isn't attached to a gui, the severity being due to the chernobyl effect acting on a common exception. Can you put a try/catch around the code and dump the stack somewhere like a log file? I forget the normal error handling appraoches for servlets but popping up a dialog isn't the first thought I would have. In creatjpg servlet, the statements that generate the image are as follows: created a bufferedimage and ... ... JPEGEncodeParam eP = JPEGCodec.getDefaultJPEGEncodeParam(bufferedImage); eP.setQuality(1.0f, true); JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(out); encoder.encode(bufferedImage, eP); Did you try to hit the url you expect to contain your image using something like wget and verify the server is returning a valid image file with any headers that may be relevant? I imported the following classes in createjpg servlet: import java.awt.*; import java.awt.geom.*; import java.awt.image.*; import java.awt.Color; import java.awt.Font; import java.io.*; import java.io.IOException; import com.sun.image.codec.jpeg.JPEGEncodeParam; import com.sun.image.codec.jpeg.JPEGImageEncoder; import javax.servlet.http.*; import javax.servlet.*; import com.sun.image.codec.jpeg.JPEGCodec; Are you worried about name conflicts? What does this tell us? As I mentioned above, the riskmap servlet (pdf objects) does not receive images from createjpg servlet. Can someone please let me know how I can troubleshoot/solve this problem?. Go to sun.com and find debugging strategies for servlets and come back here with and stack traces that mention lowagie or itext. _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/201469228/direct/01/ -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Chinese font in linux could not displayed.
Date: Fri, 29 Jan 2010 13:23:34 +0800 From: To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Chinese font in linux could not displayed. Hi, all: I use itext to export the swt chart to a pdf file. and there are some Chinese characters, but all the Chinese characters are displayed as blank. and I have used itextasian.jar in my project. Now I can export the pdf file with Chinese corrently in Windows, but when it runs on linux(ubuntu 9.10), Chinese characters only display as blank. Any reply is welcome. You are getting the same silent failure mode in first and last case? Generally porting java is quite simple unless you rely on some native support ( jni and dll for example). However, usually jars end up in the wrong place or with missing permissions. Is there any way to get more diagnostics out so someone can complain about font not found for example? I guess if you are sure the jar is in the right place and there aren't any java version problems ( type `which java` and make sure it picksu up something recent from Sun) try to chmod 755 so everyone can execute it. This is some of my code: BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(fileName)); // convert chart to PDF with iText: Rectangle pagesize = new Rectangle(width, height); Document document = new Document(pagesize, 50, 50, 50, 50); try { PdfWriter writer = PdfWriter.getInstance(document, out); document.addAuthor(popjxc); //$NON-NLS-1$ document.open(); PdfContentByte cb = writer.getDirectContent(); PdfTemplate tp = cb.createTemplate(width, height); Graphics2D g2 = tp.createGraphics(width, height, new AsianFontMapper(STSongStd-Light, // support Chinese CharSet UniGB-UCS2-H)); Rectangle2D r2D = new Rectangle2D.Double(0, 0, width, height); piechart.draw(g2, r2D, null); g2.dispose(); cb.addTemplate(tp, 0, 0); document.newPage(); tp = cb.createTemplate(width, height); g2 = tp.createGraphics(width, height, new AsianFontMapper(STSongStd-Light, UniGB-UCS2-H)); r2D = new Rectangle2D.Double(0, 0, width, height); barchart.draw(g2, r2D, null); g2.dispose(); cb.addTemplate(tp, 0, 0); } finally { document.close(); } _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/196390709/direct/01/ -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Chinese font in linux could not displayed.
Date: Fri, 29 Jan 2010 13:04:39 +0100 From: To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Chinese font in linux could not displayed. Mike Marchywka wrote: You are getting the same silent failure mode in first and last case? If I understand correctly, whether or not Java is used is irrelevant. It's about the PDF. CJK fonts are never embedded, and it's perfectly normal that the glyphs don't show up if the font isn't available. I think the OP claimed adding a jar file fixed the behaviour in the one case. Doing nothing is fine for normal usage but on either end, authoring or displaying, you really need to have tools that can accept a -verbose option to explain what they are doing ( or at least provide source code so you can see for yourself). -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/196390708/direct/01/ -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Extract jpeg image color problem
Since no on else replied, Date: Wed, 27 Jan 2010 12:58:57 +0200 From: To: itext-questions@lists.sourceforge.net Subject: [iText-questions] Extract jpeg image color problem Hi I can successfully extract a jpeg image from a PDF document, but the color is all messed up. Did you do this with itext? In any case can you post some code? Any help would be appreciated It depends what you mean by messed up. I'll assume this is not a well known issue so some details may help. In particular, is color map shifted through entire image ( r-g for example) or does it change on each line? I've seen this a lot with various image formats and lines with non-mod-N length since padding specs are often ambiguous and lower level code may just do whatever machine does. Is this a 64 bit machine for example? I think now that you mention it I may have seen rendered pages from the open source viewer I used have color shifts (uniform color table change that looks like an off-by-one RGB alignment issue). I'm used to seeing this from various sources and wasn't important at the time so I didn't track it down. I have noted that different image viewers can display the same jpg ( presumably it is not quite right but still a jpg LOL) differently too, have you tried different viewers or examined the jpg bytes to see what it should look like? Thanks _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/196390709/direct/01/ -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] PDFTable issue
Date: Mon, 25 Jan 2010 12:39:09 +0100 To: itext-questions@lists.sourceforge.net Subject: [iText-questions] PDFTable issue Hi ... I have a problem with PDFTable under java. When I try to create a cell with colspan it works. Same with rowspan. But if I try to enable both of them to one cell java throes a NullPointExeption. For example I cant create a table like this: Do you have a stack trace or something? You may get lucky and someone will know of a common issue but if you could post the details of the exception someone who happens to be browsing the source code may be able to help you too. If there is any solution pls write it to me! Tnx for help! Adam Sandor _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/196390709/direct/01/ -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Reg : pdf to jpeg conversion
Date: Fri, 22 Jan 2010 14:33:24 +0100 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Reg : pdf to jpeg conversion Murali Jillella wrote: Is it possible to get the contents of the image into a byte array? No, when you create an Image this way, you're creating a Form XObject, not an Image XObject. Big difference! I have noticed that jpeg.getOriginalData() returns null. Why? Because there is no original data; only PDF syntax. iText doesn't do PDF to JPG conversions. I would suggest an open source renderer but in response to your question I started looking around sun.com and maybe there are some simple things that work with JMF or JAI for this, I couldn't tell as these were largely forum posts and then I decided it wasn't that big a deal to me but you may find something simple over there, not sure they keep adding stuff to java. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/196390708/direct/01/ -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Number of page after merging pdf file
Date: Tue, 19 Jan 2010 18:21:05 +0100 From: To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Number of page after merging pdf file Degla Degla wrote: i'm using a method to merge pdf document , i dont find a method to generate a good number of pages on footer, i explain : when i merge pdf files, in my doc i have original page number and not a good numerotation (from 1 to n but i have many 1) Your PDF is like a book. Merging is like taking photocopies. You can't remove the original page number. That's inherent to PDF, didn't you know? I thought Leonard had many comments about preserving the logical structure of documents. Isn't there any standard way in which a pdf authoring tool could tell everyone else 'this character here is a page number with value foo? Presumably such a facility would let someone manipulating the document manipulate identifiable things. If you were set of preserving logic, structure, and information coherence while producing a cute picture that the boss likes, what options would you have? Does the itext book discuss this at all? It could save a lot of people from dead ends. Thanks. You could add new page numbers, though. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/196390709/direct/01/ -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] Get table from PDF IText
Date: Sat, 16 Jan 2010 11:04:12 +0100 From: i...@1t3xt.info To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] Get table from PDF IText aro1982 wrote: Is there any solution to get table (structure, cells content) from existing PDF? I've created some PDF with PdfPTable and I want to get this table from file. I've tried it in many ways but it is very difficult to do and I can't find any examples which can help me. That's because you're trying something that is impossible. Are there qualifications or alternative appraoches here? That is, Leonard has sometimes offered that it is possible to preserve the logical structure in a document so that people who want to use computers to automate data processing instead of just look at pictures can, with varying amounts of effort, do so with a pdf file. If an ambitious pdf author wanted to allow a user to extract a csv file equivalent to his table, without all the formatting junk and just the data, how may he go about designing the document ? Data generally gomes into forms either from manual entry ( typing ) or some other source in a character format not garbled into pixels using an arbitrary font. It would be nice in many cases to preserve this information. Thanks. -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/196390709/direct/01/ -- Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/