Re: [iText-questions] performance follow up
Hello, On Apr 23, 2010, at 10:50 PM, trumpetinc wrote: Don't know if it'll make any difference, but the way you are reading the file is horribly inefficient. If the code you wrote is part of your test times, you might want to re-try, but using this instead (I'm just tossing this together - there might be type-os): ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[8092]; int n; while ((n = is.read(buf)) = 0) { baos.write(buf, 0, n); } return baos.toByteArray(); I tried your suggestion above and made no significative difference compared to doing the loading from iText. The fastest I could get my use case to work using this pre-loading concept was by loading the whole file in one shot using the code below. Applying the cumulative patch plus preloading the whole PDF using the code below, my original test-case now performs 7.74% faster than before, roughly 22% away from competitor now ... btw the average response time numbers I was getting: - average response time of 77ms original unchanged test-case from the office multi-processor-multi-core workstation - average response time of 15ms original unchanged test-case from home using my MBP I attribute the huge difference between those two similar experiments mainly to having an SSD drive in my MBP ... the top Host spots reported from the profiler are related one way or another to IO so would be no wonder that with an SSD drive the response time improves by a factor of 5x. There are other differences though e.g. OS, JVM version. Best regards, Giovanni private static byte[] file2ByteArray(String filePath) throws Exception { InputStream input = null; try { File file = new File(filePath); input = new BufferedInputStream(new FileInputStream(filePath)); byte[] buff = new byte[(int) file.length()]; input.read(buff); return buff; } finally { if (input != null) { input.close(); } } } -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
[iText-questions] iText and Substance together
I am using both iText and the Substance look-and-feel together in a commercial application. When using iText to output a PDF of a screenshot of the entire application, it *almost* works. All GUI elements print fairly well with the exception of cells in a JTable. For cells in a JTable, if the table cell font is not bold, the text will be placed into the PDF document but will be invisible in the PDF (either because the color is the same as the background, or it is transparent, or something of that sort.) If the table cell font is bold, the text will be placed into the PDF as a bitmap image, but as a weird outline. A very short program which can reproduce this bug, as well as more information, is posted on the substance discussion board here: https://substance.dev.java.net/servlets/ProjectForumMessageView?forumID=1484messageID=35746 While I doubt anyone on either board will have an answer, I live in hope! Thanks. This problem is identical with old (2.1.7) and new (5.0.2) versions of iText and with old (5.3) or new (6.0) versions of Substance. -- View this message in context: http://old.nabble.com/iText-and-Substance-together-tp28346191p28346191.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
From: brave...@gmail.com Date: Sat, 24 Apr 2010 13:05:26 +0200 To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up Hello, On Apr 23, 2010, at 10:50 PM, trumpetinc wrote: Don't know if it'll make any difference, but the way you are reading the file is horribly inefficient. If the code you wrote is part of your test times, you might want to re-try, but using this instead (I'm just tossing this together - there might be type-os): ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[8092]; int n; while ((n = is.read(buf))= 0) { baos.write(buf, 0, n); } return baos.toByteArray(); I tried your suggestion above and made no significative difference compared to doing the loading from iText. The fastest I could get my use case to work using this pre-loading concept was by loading the whole file in one shot using the code below. If as indicated below you are generally IO limited, don't throw the code out yet. If you must copy data you want to use array based methods as often as possible but the first preference is to avoid copies unless of course you are strategicly preloading or something. I often just turn everything into a byte array but obviously this doesn't scale too well unless you are content to let VM do your swapping for you. Ideally you would just load what you need in a just-in-time fashion to avoid tying up idle RAM. Applying the cumulative patch plus preloading the whole PDF using the code below, my original test-case now performs 7.74% faster than before, roughly 22% away from competitor now ... btw the average response time numbers I was getting: - average response time of 77ms original unchanged test-case from the office multi-processor-multi-core workstation - average response time of 15ms original unchanged test-case from home using my MBP I attribute the huge difference between those two similar experiments mainly to having an SSD drive in my MBP ... the top Host spots reported from the profiler are related one way or another to IO so would be no wonder that with an SSD drive the response time improves by a factor of 5x. There are other differences though e.g. OS, JVM version. Multi-proc and disk cache can cause some confusions. I wouldn't ignore task manager for some initial investigations- if the CPU drops and disk light comes on you are likely to be disk limited. With IO it is easy to get nickel-and-dimed to death as everyone who relays the data can be low on profile chart but it adds up. Wall-clock times are least susceptible to manipulation and may be best for A-B comparisons if you have control over other stuff running on machine ( cash flow versus pro-forma earnings LOL). If you can subclass the random access file thing you may be able to first collect statistics and then write something that can see into the future a few milliseconds. All the generic caches work on past results, things like MRU except maybe the prefetch which assumes you will continue to do sequential memory accesses. If you are in a posittion to make forward looking statements that have a material impact on your performance you ( ROFL) you may be able to do much better. Best regards, Giovanni private static byte[] file2ByteArray(String filePath) throws Exception { InputStream input = null; try { File file = new File(filePath); input = new BufferedInputStream(new FileInputStream(filePath)); byte[] buff = new byte[(int) file.length()]; input.read(buff); return buff; } finally { if (input != null) { input.close(); } } } _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
[iText-questions] Extracting Comments
Hi itext: I want to parse the comments from the PDF file using itext. Any Idea? Regards, Vajahat -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] iText 5.0.1 embedded fonts and smartcopy
anybody know why this happens? Seems wrong to see the same font embedded multiple times yet end up with a smaller document. Short of Document Props Fonts scrolling forever, is there any harm in letting it embed the subset. My main goal was to reduce file size after concat'ing several (~30,000) PDFs. Jason -Original Message- From: Jason Berk [mailto:jb...@purdueefcu.com] Sent: Fri 4/23/2010 5:14 PM To: Post all your questions about iText here Subject: [iText-questions] iText 5.0.1 embedded fonts and smartcopy I have three fonts which each contain 1 glyph. I created 100 identical pdfs that uses this font and then used smartcopy to merge all 100 pages. The resulting PDF is 184KB and when I look at the document properties, it shows the font 100 times (presumably because it was an embedded subset). I added myFont.setSubset(false); and reran the test. Now when I view the properties of the merged pdf, I only see my font once (as expected), yet the size of my merged PDF grew to 327KB! (not expected) As I understood it, SmartCopy didn't reuse fonts that were subsets. public class Fonts { public static final Font VISA; public static final Font SCORECARD; public static final Font MICR; static { BaseFont _visa = null; BaseFont _scorecard = null; BaseFont _micr = null; try { _visa = BaseFont.createFont(/fonts/CREDITCARD.ttf, BaseFont.WINANSI, BaseFont.EMBEDDED); _visa.setSubset(false); // INCREASES FILE SIZE?!?! _scorecard = BaseFont.createFont(/fonts/SCORECARD.ttf, BaseFont.WINANSI, BaseFont.EMBEDDED); _scorecard.setSubset(false); // INCREASES FILE SIZE?!?! _micr = BaseFont.createFont(/fonts/OCRAEXT.ttf, BaseFont.WINANSI, BaseFont.EMBEDDED); _micr.setSubset(false); // INCREASES FILE SIZE?!?! } catch (Exception e) { e.printStackTrace(); System.exit(1); } VISA = new Font(_visa, 12); SCORECARD = new Font(_scorecard, 12); MICR = new Font(_micr, 12); } } private void generateStatements() { try { log.info(begin generating statements); Document d = new Document(); PdfSmartCopy copy = new PdfSmartCopy(d, new FileOutputStream(C:/temp/aMerged.pdf)); d.open(); for (int i = 1; i = 100; i++) { Document document = new Document(); PdfWriter.getInstance(document, new FileOutputStream(C:/temp/test + i + .pdf)); document.open(); document.add(new Paragraph(LARGE FONTS, Fonts.NORMAL)); document.add(new Paragraph(testing our font class, Fonts.LARGE_NORMAL)); document.add(new Paragraph(testing our font class, Fonts.LARGE_BOLD)); document.add(new Paragraph(testing our font class, Fonts.LARGE_UNDERLINE)); document.add(new Paragraph(testing our font class, Fonts.LARGE_ITALIC)); document.add(new Paragraph(\n\nNORMAL FONTS, Fonts.NORMAL)); document.add(new Paragraph(testing our font class, Fonts.NORMAL)); document.add(new Paragraph(testing our font class, Fonts.BOLD)); document.add(new Paragraph(testing our font class, Fonts.UNDERLINE)); document.add(new Paragraph(testing our font class, Fonts.ITALIC)); document.add(new Paragraph(\n\nSMALL FONTS, Fonts.NORMAL)); document.add(new Paragraph(testing our font class, Fonts.SMALL_NORMAL)); document.add(new Paragraph(testing our font class, Fonts.SMALL_BOLD)); document.add(new Paragraph(testing our font class, Fonts.SMALL_UNDERLINE)); document.add(new Paragraph(testing our font class, Fonts.SMALL_ITALIC)); document.add(new Paragraph(\n\nCOLORED FONTS, Fonts.NORMAL)); document.add(new Paragraph(testing our font class, Fonts.PEFCU_RED_NORMAL)); document.add(new Paragraph(\n\nWHITE FONTS, Fonts.NORMAL)); Chunk chunk = new Chunk(testing our font class, Fonts.WHITE_NORMAL); chunk.setBackground(Colors.BLACK); document.add(new Paragraph(chunk)); Chunk chunk2 = new Chunk(testing our font class, Fonts.WHITE_BOLD); chunk2.setBackground(Colors.BLACK); document.add(new Paragraph(chunk2)); document.add(new
Re: [iText-questions] performance follow up
If the file is being entirely pre-loaded, then I doubt that IO blocking is a significant contributing factor to your test. I think that the best clue here may be the difference between performance with form flattening and without form flattening. Just to confirm, am I right in saying that iText outperforms the competitor by a significant amount in the non-flattening scenario? If that's the case, then it seems like we should see significant differences in the profiling results between the flattening and non-flattening scenarios in iText. Would you be willing to post the profiling results for both cases so we can see which code paths are consuming the most runtime in each? Another possibility if the profiling results show similar hotspots is that the form flattening algorithms in iText are using the hotspot areas a lot more than in the non-flattening case. There may be a bunch of redundant reads or something in the flattening case. Let's take a look at the profiling results and see if we can draw any conclusions about where to go next. BTW - which profiler are you using? Are you able to expand each of the hotspot code paths and see the actual call path that is causing the bottleneck? I use jvvm, and the results of expanding the hotspot call trees can be quite illuminating. What I really would like is to get ahold of your two benchmark tests (with and without flattening) so I can run it on my system - do you have anything you can package up and share? - K Giovanni Azua-2 wrote: Hello, On Apr 23, 2010, at 10:50 PM, trumpetinc wrote: Don't know if it'll make any difference, but the way you are reading the file is horribly inefficient. If the code you wrote is part of your test times, you might want to re-try, but using this instead (I'm just tossing this together - there might be type-os): ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[8092]; int n; while ((n = is.read(buf)) = 0) { baos.write(buf, 0, n); } return baos.toByteArray(); I tried your suggestion above and made no significative difference compared to doing the loading from iText. The fastest I could get my use case to work using this pre-loading concept was by loading the whole file in one shot using the code below. Applying the cumulative patch plus preloading the whole PDF using the code below, my original test-case now performs 7.74% faster than before, roughly 22% away from competitor now ... btw the average response time numbers I was getting: - average response time of 77ms original unchanged test-case from the office multi-processor-multi-core workstation - average response time of 15ms original unchanged test-case from home using my MBP I attribute the huge difference between those two similar experiments mainly to having an SSD drive in my MBP ... the top Host spots reported from the profiler are related one way or another to IO so would be no wonder that with an SSD drive the response time improves by a factor of 5x. There are other differences though e.g. OS, JVM version. Best regards, Giovanni private static byte[] file2ByteArray(String filePath) throws Exception { InputStream input = null; try { File file = new File(filePath); input = new BufferedInputStream(new FileInputStream(filePath)); byte[] buff = new byte[(int) file.length()]; input.read(buff); return buff; } finally { if (input != null) { input.close(); } } } -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28352147.html Sent from the iText - General mailing list archive at Nabble.com. -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] performance follow up
Isn't there something in PDF about linearization? ( the term comes up as a suggestion on google, LOL). How can you compare the two resulting pdf's in terms of dynamic attributes or arbitrary ordering or some items- given issues with IO and access patterns this could be an issue. In fact, you could even imagine that if you could reorder somethings you get win-win for creation and future rendering time. What is the extent of the freedom here? It sounds like any hints you would generate for reader could be used during document manipulation in itext. Date: Sat, 24 Apr 2010 11:59:14 -0700 From: forum_...@trumpetinc.com To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] performance follow up If the file is being entirely pre-loaded, then I doubt that IO blocking is a significant contributing factor to your test. I think that the best clue here may be the difference between performance with form flattening and without form flattening. Just to confirm, am I right in saying that iText outperforms the competitor by a significant amount in the non-flattening scenario? If that's the case, then it seems like we should see significant differences in the profiling results between the flattening and non-flattening scenarios in iText. Would you be willing to post the profiling results for both cases so we can see which code paths are consuming the most runtime in each? Another possibility if the profiling results show similar hotspots is that the form flattening algorithms in iText are using the hotspot areas a lot more than in the non-flattening case. There may be a bunch of redundant reads or something in the flattening case. Let's take a look at the profiling results and see if we can draw any conclusions about where to go next. BTW - which profiler are you using? Are you able to expand each of the hotspot code paths and see the actual call path that is causing the bottleneck? I use jvvm, and the results of expanding the hotspot call trees can be quite illuminating. What I really would like is to get ahold of your two benchmark tests (with and without flattening) so I can run it on my system - do you have anything you can package up and share? - K Giovanni Azua-2 wrote: Hello, On Apr 23, 2010, at 10:50 PM, trumpetinc wrote: Don't know if it'll make any difference, but the way you are reading the file is horribly inefficient. If the code you wrote is part of your test times, you might want to re-try, but using this instead (I'm just tossing this together - there might be type-os): ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[8092]; int n; while ((n = is.read(buf))= 0) { baos.write(buf, 0, n); } return baos.toByteArray(); I tried your suggestion above and made no significative difference compared to doing the loading from iText. The fastest I could get my use case to work using this pre-loading concept was by loading the whole file in one shot using the code below. Applying the cumulative patch plus preloading the whole PDF using the code below, my original test-case now performs 7.74% faster than before, roughly 22% away from competitor now ... btw the average response time numbers I was getting: - average response time of 77ms original unchanged test-case from the office multi-processor-multi-core workstation - average response time of 15ms original unchanged test-case from home using my MBP I attribute the huge difference between those two similar experiments mainly to having an SSD drive in my MBP ... the top Host spots reported from the profiler are related one way or another to IO so would be no wonder that with an SSD drive the response time improves by a factor of 5x. There are other differences though e.g. OS, JVM version. Best regards, Giovanni private static byte[] file2ByteArray(String filePath) throws Exception { InputStream input = null; try { File file = new File(filePath); input = new BufferedInputStream(new FileInputStream(filePath)); byte[] buff = new byte[(int) file.length()]; input.read(buff); return buff; } finally { if (input != null) { input.close(); } } } -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28352147.html Sent from the iText - General mailing list archive at Nabble.com. --
Re: [iText-questions] performance follow up
On Apr 24, 2010, at 8:59 PM, trumpetinc wrote: If the file is being entirely pre-loaded, then I doubt that IO blocking is a significant contributing factor to your test. After I did the entire pre-loading, taking the entire file at once the benchmarks look better yes, meaning there is some bottleneck in the way itext handles the loading of the PDF files. Besides changing to a different storage i.e. from non SSD in the office to SSD in my laptop shows a performance improvement by a factor of 5x, of course there could be other reasons but I would be willing to bet that this 5x faster is by a high margin due to the fast SSD. If there is something SSD are really good at is Random access and iText is doing that and a lot. Benchmarking the alternative in my laptop shows: alternative mean RT: 18ms itext mean RT: 14ms So in my laptop itext is faster than the alternative ... why? I think because of random access. If itext was doing a lot of random access it could slow it down in a non-SSD drive like the one I have in the office.I have to benchmark again itext in the office to see how it performs with the new load the entire file strategy. Because of these variations I will setup the experiment in the actual hardware where it will be deployed. I think that the best clue here may be the difference between performance with form flattening and without form flattening. Just to confirm, am I right in saying that iText outperforms the competitor by a significant amount in the non-flattening scenario? If that's the case, then it seems like we should see significant differences in the profiling results between the flattening and non-flattening scenarios in iText. Would you be willing to post the profiling results for both cases so we can see which code paths are consuming the most runtime in each? I posted this yesterday, see http://old.nabble.com/more-on-performance-td28346917.html - FOOTER 4x shows the Hot spot profiler results in the loading and flattening case - HEADER 4x shows the Hot spot profiler results for the loading only BTW - which profiler are you using? Are you able to expand each of the hotspot code paths and see the actual call path that is causing the bottleneck? I use jvvm, and the results of expanding the hotspot call trees can be quite illuminating. I am using JProfiler. I can expand the Hotspots, it shows the full call trees leading to the Hot spot. What I really would like is to get ahold of your two benchmark tests (with and without flattening) so I can run it on my system - do you have anything you can package up and share? I will prepare it for you ... Best regards, Giovanni -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/