Actually what I wanted to do with the content almost
amounts to metrics:  I want to run through the text
and check each word against some REs, taking different
actions for the words that satisfy the REs (mostly
counting words).  I was thinking about using an Antlr
parser/lexer to do this, once I have the text.  I
searched on FreshMeat yesterday and didn't find
anything that looked like it was geared for what I'm
looking for.  It may be that most of the things that
exist are not really designed to be interrupted where
I need to interrupt them.

  Of course when I said a package, I meant an open
source Java API.  :)  I'm cheap.

Thanks,
Matt

--- Leonard Rosenthol <[EMAIL PROTECTED]> wrote:
> At 03:03 PM 4/3/2003 -0800, Matt Benson wrote:
> >Does anyone (Leonard) know of a package that will
> do
> >this, or should I implement parsing text from one
> of
> >JPedal or PdfBox?
> 
>          There are LOTS of PDF indexing engines out
> there - commercial, 
> open source, your choice of languages, etc.   Do a
> search on FreshMeat...
> 
>          OR you could indeed use JPEDAL or PdfBox to
> do it yourself - but 
> that's just the extraction, indexing is the harder
> part to get right, esp. 
> if you plan to offer linquistic support (stemming,
> Unicode, etc.) and 
> efficient storage of the tables.
> 
> 
> Leonard
>
---------------------------------------------------------------------------
> Leonard Rosenthol                           
> <mailto:[EMAIL PROTECTED]>
> Chief Technical Officer                     
> <http://www.pdfsages.com>
> PDF Sages, Inc.                             
> 215-629-3700 (voice)
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb: 
Dedicated Hosting for just $79/mo with 500 GB of bandwidth! 
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
iText-questions mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Reply via email to