Regarding the suggestions on how to tackle my problem, thank you. I had reviewed PdfBox and iText literature, hoping there might be a more trivial approach, but it seems to be confirmed that I will have to parse the PDF (at a low level) moving the text while rendering the non-text objects to a bitmap sized appropriately. I am a programmer so its not out of the question; it's just a big investment in time I had hoped to avoid :) Also, I am lookinat at Apago's PDF Enhancer. It looks like it should do what I want (and more). Have not gotten the selective rasterization to do what I want yet; it seems to rasterize the text too, but probably my mistake. Will seek assistance from Apago.
-D On Thu, Mar 26, 2009 at 6:03 AM, Mike Marchywka <[email protected]>wrote: > > > > > The ONLY company that offers this technology is Apago ( > http://www.apago.com) in their PDF Enhancer > > product. The technology is called “Selective Rasterization” and works > quite > > nicely (of course, I designed/architected it and wrote the associate code > – so I > > am kind of biased ;). > > > Why is this so difficult? What patents or publications came out of this > effort? Can you outline the issues as I would think with html you could > just swap out images. Is there a key objective this would not accomplish? > > I have to buy a product that took a heroic effort to produce > just to let me integrate data from multiple sources > and this is an "enhancement?" LOL > > http://www.apagoinc.com/documents/PR-PDF-Enhancer-30.pdf > > > > > > > It is NOT a trivial piece of work – which is one of the reason > > why no one else offers it…it took me a sizable amount of time, and I’ve > been > > doing PDF for>10 years. > > > > After reading more of the spec documents, I was hoping to find an > opportunity > to congratulate Adobe for distinguishing target content from "artifacts" > and in fact labeling formatting "junk" with a well chosen term :) > The comments in the spec about reflowing and the importance of logical > structure make it sound like there is the potential here for > a reasonably well authored document to appeal to both the automated > data processor and the viewer-of-nice-pictures. > > However, in response to your above post, I how have to ask, "What other > simple tasks related to compacting the pictures and highlighting > the information require heroic efforts once customers commit to PDF > formats?" > LOL. With html you could just swap out the images. > > > [ same rant I've posted 10 times, different words however ] > > It sounds like you consider this a design feature. That is great but > then you end up with situations like the US IRS offering documents to > people who > are unable to extract their own tax numbers from the artwork because > no one enabled "user rights" or found some other features for those with > proprietary interests. This isn't just with high-volume public documents as > apparently special-field authors interested in letting people > cite their work accurately ( using computer tools to automate the > process without retyping ) aren't always aware ( or at least were not > aware > in recent past ) of your security/encryption "features" either. > I'm beginning at least to believe your comments that this is a problem > with customers and not inherent entirely in the PDF as an information > communication format but I don't get the idea you do a lot to educate them > or make the features well known ( apparently Reader could just > as easily offer a greyed-out option when viewing the US IRS 1040 forms so > a > user like myself could complain to the IRS to "enable user rights" > instead of blaming adobe for creating an information black hole > but either your or customers don't want to let users know it is even > possible ). > > Again, this typifies a big concern selling public agencies on your picture > format when many people outside of Adobe or MSFT tools/mindset try to use > the information in new ways, especially when information from many sources > needs to be extracted and analyzed together. I don't have an algebra > for pictures but I've managed to get pretty good with integer math... > > > I'll even concede that "along with freedom comes responsibility" > > and if you offer a versatile format it can be difficult to > > sell the right defaults to every customers but in this case > > it seems the format lacks some versatility ( versatility has > > to be realizable, not just hypothetical and it sounds like this > > comparatively simple task is not simple). > > > > > > > > > > > > > > > > > > Leonard > > > > > > > > > > > > > > > > > > > > From: Doug Moreland > > [mailto:[email protected]] > > > > Sent: Wednesday, March 25, 2009 6:16 PM > > > > To: [email protected] > > > > Subject: [iText-questions] How to duplicate PDF text but rasterize > > graphics > > > > > > > > > > > > > > > > > > > > Looking > > for advice on the best approach to do something others may have tried. I > have > > PDFs with text and graphics. They are very large, due to the graphics. I > want > > to read each PDF and produce a new PDF with the text just as it was in > the > > original, but rasterize the rest of the graphics into a fairly low res > bitmap > > to be added behind the text, reducing the overall filesize of the bitmap. > I do > > not need to manipulate the text, just replicate it. Everything else can > be > > bitmapped. > > > > > > > > > > > > Any starting places would be greatly > > appreciated. thank you. > > > > > > > > > > > > _________________________________________________________________ > Hotmail® is up to 70% faster. Now good news travels really fast. > http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_70faster_032009 > > ------------------------------------------------------------------------------ > _______________________________________________ > iText-questions mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > -- Doug Moreland
------------------------------------------------------------------------------
_______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
