Re: [iText-questions] How to duplicate PDF text but rasterize graphics

Leonard Rosenthol Thu, 26 Mar 2009 05:27:38 -0700

You just like arguing ;).

I don't think you are understanding the problem that the person is having.  
It's not about simply finding already rastered images and replacing them with 
alternate versions - that's pretty simple and there is an example of using 
iText for doing just that.  The problem is a  need to convert/rasterize all 
vector artwork on the page into one or more raster images WITHOUT impacting the 
text AND WITHOUT changing any visual appearance of the content of the page.  In 
the process, you will probably want to optimize the output.


Actually, this problem is difficult with HTML too.   Consider an HTML page that 
uses CSS for absolute or relative positioning of content, so that there is an 
opportunity for multiple objects to overlay & intersect with each other.   Your 
task is to reduce that to the smallest number of objects that produce the same 
visual result with all text kept intact.  

Now, add to that all the complexities of the PDF rendering model - overlapping 
(Z-ordered) objects, color management, rich transparency model, etc.  And these 
are all things that are being considered for HTML5 - so that the same problems 
would now manifest themselves in that environment as well.

So again, this has nothing to do with structured/tagged PDF - it's a completely 
unrelated problem/issue. 

Leonard

-----Original Message-----
From: Mike Marchywka [mailto:[email protected]] 
Sent: Thursday, March 26, 2009 8:03 AM
To: [email protected]
Subject: Re: [iText-questions] How to duplicate PDF text but rasterize graphics


>
> The ONLY company that offers this technology is Apago (http://www.apago.com) 
> in their PDF Enhancer
> product. The technology is called "Selective Rasterization" and works quite
> nicely (of course, I designed/architected it and wrote the associate code - 
> so I
> am kind of biased ;).
>
Why is this so difficult? What patents or publications came out of this
effort? Can you outline the issues as I would think with html you could
just swap out images. Is there a key objective this would not accomplish? 

I have to buy a product that took a heroic effort to produce
just to let me integrate data from multiple sources
and this is an "enhancement?" LOL

http://www.apagoinc.com/documents/PR-PDF-Enhancer-30.pdf

>
>
> It is NOT a trivial piece of work - which is one of the reason
> why no one else offers it...it took me a sizable amount of time, and I've been
> doing PDF for>10 years.
>

After reading more of the spec documents, I was hoping to find an opportunity
to congratulate Adobe for distinguishing target content from "artifacts"
and in fact labeling formatting "junk" with a well chosen term :)
The comments in the spec about reflowing and the importance of logical
structure make it sound like there is the potential here for
a reasonably well authored document to appeal to both the automated
data processor and the viewer-of-nice-pictures. 

However,  in response to your above post, I how have to ask, "What other simple 
tasks related to compacting the pictures and highlighting 
the information require heroic efforts once customers commit to PDF formats?"
LOL. With html you could just swap out the images. 


[ same rant I've posted 10 times, different words however ]

It sounds like you consider this a design feature. That is great but
then you end up with situations like the US IRS offering documents to people who
are unable to extract their own tax numbers from the artwork because
no one enabled "user rights" or found some other features for those with
proprietary interests. This isn't just with high-volume public documents as
apparently special-field authors interested in letting people 
cite their work accurately ( using computer tools to automate the 
process without retyping )  aren't always aware ( or at least were not aware
in recent past ) of your security/encryption "features" either.
I'm beginning at least to believe your comments that this is a problem
with customers and not inherent entirely in the PDF as an information
communication format but I don't get the idea you do a lot to educate them
or make the features well known ( apparently Reader could just
as easily offer a greyed-out option when viewing  the US IRS 1040 forms so a 
user like myself could complain to the IRS to "enable user rights" 
instead of blaming adobe for creating an information black hole 
but either your or customers don't want to let users know it is even possible 
). 

Again, this typifies a big concern selling public agencies on your picture
format when many people outside of Adobe or MSFT tools/mindset try to use the 
information in new ways, especially when information from many sources
needs to be extracted and analyzed together. I don't have an algebra
for pictures but I've managed to get pretty good with integer math...


I'll even concede that "along with freedom comes responsibility"

and if you offer a versatile format it can be difficult to 

sell the right defaults to every customers but in this case

it seems the format lacks some versatility ( versatility has

to be realizable, not just hypothetical and it sounds like this

comparatively simple task is not simple).




>
>
>
>
>
>
> Leonard
>
>
>
>
>
>
>
>
>
> From: Doug Moreland
> [mailto:[email protected]]
>
> Sent: Wednesday, March 25, 2009 6:16 PM
>
> To: [email protected]
>
> Subject: [iText-questions] How to duplicate PDF text but rasterize
> graphics
>
>
>
>
>
>
>
>
>
> Looking
> for advice on the best approach to do something others may have tried. I have
> PDFs with text and graphics. They are very large, due to the graphics. I want
> to read each PDF and produce a new PDF with the text just as it was in the
> original, but rasterize the rest of the graphics into a fairly low res bitmap
> to be added behind the text, reducing the overall filesize of the bitmap. I do
> not need to manipulate the text, just replicate it. Everything else can be
> bitmapped.
>
>
>
>
>
> Any starting places would be greatly
> appreciated. thank you.
>
>
>
>
>

_________________________________________________________________
Hotmail(r) is up to 70% faster. Now good news travels really fast.
http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_70faster_032009
------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Re: [iText-questions] How to duplicate PDF text but rasterize graphics

Reply via email to