Re: [iText-questions] How to duplicate PDF text but rasterize graphics

Doug Moreland Thu, 26 Mar 2009 06:45:48 -0700

Regarding the suggestions on how to tackle my problem, thank you. I had
reviewed PdfBox and iText literature, hoping there might be a more trivial
approach, but it seems to be confirmed that I will have to parse the PDF (at
a low level) moving the text while rendering the non-text objects to a
bitmap sized appropriately. I am a programmer so its not out of the
question; it's just a big investment in time I had hoped to avoid :)
Also, I am lookinat at Apago's PDF Enhancer. It looks like it should do what
I want (and more). Have not gotten the selective rasterization to do what I
want yet; it seems to rasterize the text too, but probably my mistake. Will
seek assistance from Apago.


-D

On Thu, Mar 26, 2009 at 6:03 AM, Mike Marchywka <[email protected]>wrote:

>
> >
> > The ONLY company that offers this technology is Apago (
> http://www.apago.com) in their PDF Enhancer
> > product. The technology is called “Selective Rasterization” and works
> quite
> > nicely (of course, I designed/architected it and wrote the associate code
> – so I
> > am kind of biased ;).
> >
> Why is this so difficult? What patents or publications came out of this
> effort? Can you outline the issues as I would think with html you could
> just swap out images. Is there a key objective this would not accomplish?
>
> I have to buy a product that took a heroic effort to produce
> just to let me integrate data from multiple sources
> and this is an "enhancement?" LOL
>
> http://www.apagoinc.com/documents/PR-PDF-Enhancer-30.pdf
>
> >
> >
> > It is NOT a trivial piece of work – which is one of the reason
> > why no one else offers it…it took me a sizable amount of time, and I’ve
> been
> > doing PDF for>10 years.
> >
>
> After reading more of the spec documents, I was hoping to find an
> opportunity
> to congratulate Adobe for distinguishing target content from "artifacts"
> and in fact labeling formatting "junk" with a well chosen term :)
> The comments in the spec about reflowing and the importance of logical
> structure make it sound like there is the potential here for
> a reasonably well authored document to appeal to both the automated
> data processor and the viewer-of-nice-pictures.
>
> However,  in response to your above post, I how have to ask, "What other
> simple tasks related to compacting the pictures and highlighting
> the information require heroic efforts once customers commit to PDF
> formats?"
> LOL. With html you could just swap out the images.
>
>
> [ same rant I've posted 10 times, different words however ]
>
> It sounds like you consider this a design feature. That is great but
> then you end up with situations like the US IRS offering documents to
> people who
> are unable to extract their own tax numbers from the artwork because
> no one enabled "user rights" or found some other features for those with
> proprietary interests. This isn't just with high-volume public documents as
> apparently special-field authors interested in letting people
> cite their work accurately ( using computer tools to automate the
> process without retyping )  aren't always aware ( or at least were not
> aware
> in recent past ) of your security/encryption "features" either.
> I'm beginning at least to believe your comments that this is a problem
> with customers and not inherent entirely in the PDF as an information
> communication format but I don't get the idea you do a lot to educate them
> or make the features well known ( apparently Reader could just
> as easily offer a greyed-out option when viewing  the US IRS 1040 forms so
> a
> user like myself could complain to the IRS to "enable user rights"
> instead of blaming adobe for creating an information black hole
> but either your or customers don't want to let users know it is even
> possible ).
>
> Again, this typifies a big concern selling public agencies on your picture
> format when many people outside of Adobe or MSFT tools/mindset try to use
> the information in new ways, especially when information from many sources
> needs to be extracted and analyzed together. I don't have an algebra
> for pictures but I've managed to get pretty good with integer math...
>
>
> I'll even concede that "along with freedom comes responsibility"
>
> and if you offer a versatile format it can be difficult to
>
> sell the right defaults to every customers but in this case
>
> it seems the format lacks some versatility ( versatility has
>
> to be realizable, not just hypothetical and it sounds like this
>
> comparatively simple task is not simple).
>
>
>
>
> >
> >
> >
> >
> >
> >
> > Leonard
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From: Doug Moreland
> > [mailto:[email protected]]
> >
> > Sent: Wednesday, March 25, 2009 6:16 PM
> >
> > To: [email protected]
> >
> > Subject: [iText-questions] How to duplicate PDF text but rasterize
> > graphics
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Looking
> > for advice on the best approach to do something others may have tried. I
> have
> > PDFs with text and graphics. They are very large, due to the graphics. I
> want
> > to read each PDF and produce a new PDF with the text just as it was in
> the
> > original, but rasterize the rest of the graphics into a fairly low res
> bitmap
> > to be added behind the text, reducing the overall filesize of the bitmap.
> I do
> > not need to manipulate the text, just replicate it. Everything else can
> be
> > bitmapped.
> >
> >
> >
> >
> >
> > Any starting places would be greatly
> > appreciated. thank you.
> >
> >
> >
> >
> >
>
> _________________________________________________________________
> Hotmail® is up to 70% faster. Now good news travels really fast.
> http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_70faster_032009
>
> ------------------------------------------------------------------------------
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.1t3xt.com/docs/book.php
>



-- 
Doug Moreland

------------------------------------------------------------------------------

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Re: [iText-questions] How to duplicate PDF text but rasterize graphics

Reply via email to