As the discussion below suggests, there is a lot of confusion
with PDF users about what exactly is contained in the FILE, 
what is allowable with the PDF fomat and MAY model the rendered page,  
and how these things compare to the view they have on the screen. 
While I like to make fun of various world views ("the gui IS the app"),
this is a serious problem in many contexts. 
Is there an app, or maybe pdftk does this, that can
outline a PDF document in some way telling a viewer
what is in the model that may or may not get rendered in a given view? 

I guess when questions come up like " can I get foo and bar out of my pdf"
you could just post a link to the utility to determine
if foo and bar already exist in the file or where they should be. 

I'm trying to write my own for my specific purposes but a general PDF dumper 
may have
instrucional value too. 

Also, I had asked about test files and then thought about
 a very good example that helps illustrate many
concerns : 

http://www.usda.gov/oce/weather/pubs/Weekly/Wwcb/wwcb.pdf

This file seems to be auto-generated each week with minimal human
input. It looks nice and has a predictable organization when you
check each new issue. 
The above contains a lot of nice information- text, maps, tables,etc.
I may want to just read it, or extract various pieces that could include
barcodes. The
text of course would be nice to just scroll through or maybe 
build a vocabulary list and then get extracts near
suspicious words, the tables may be good to correlate
with data from other sources, and it may be nice to pick off
data from the maps too. It would be better if the author offered
the data that went into making these files ( text, isoclines for the
weather maps, csv input data for tables, etc) but often it
isn't available. So, the the question comes down to, " how do 
I get such and such from the PDF file?" Usually the first step
is determining if it exists in the form you expect ( often text
in the case of text or barcodes ) or is there some well-known
analysis method ( usually OCR for images ) that lets me infer
that information with more complicated indirect approaches.
Then the itext involvement would be something like extracting
the piece of the PDF file that can make the image you feed to
your OCR code.
It would be nice if the author included links to rss or other
feeds used to generate PDF files from real time data etc etc. 
Of course the author could just scan a bunch of pictures and make
a fancy BMP file too... 





----------------------------------------
> Date: Thu, 2 Apr 2009 18:31:47 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: [iText-questions] Decoding Barcode in C#
>
> Self, Glen wrote:
>> Good morning,
>>
>> Barcode is a font, actually several different fonts.
>
> It CAN be a font, in iText, it's generally a series of lines that is
> drawn. Also: you're forgetting about 2D barcodes such as PDF417 and
> DataMatrix. Those aren't just lines.
>
>> “get the text back from reading barcode” is like saying “get the text
>> back from reading times roman” or “get the text back from reading courier”
>
> Not always. In some cases, it can even be a form field
> (which makes it easier to get the content).
> --
> This answer is provided by 1T3XT BVBA
> http://www.1t3xt.com/ - http://www.1t3xt.info
>
> ------------------------------------------------------------------------------
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.1t3xt.com/docs/book.php

_________________________________________________________________
Rediscover Hotmail®: Get quick friend updates right in your inbox. 
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009
------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to