Re: Reading PDF Content

2006-07-10 Thread Paul Hastings
Bryan Stevenson wrote:
> Huhyes iText does allow for the reading of text from within a PDF doc

no, bruno & paulo are always having to tell folks "sorry no" for that kind of 
functionality. check the itext archives for may-22, you'll see bruno's latest.

pdfbox "works" for this sort of thing. but what it extracts is kind of strange, 
i don't have much of a need for this so looking into this some more isn't high 
on my list.

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245958
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


RE: Reading PDF Content

2006-07-10 Thread Ben Nadel
No worries man. Thanks for looking into it.

...
Ben Nadel 
Web Developer
Nylon Technology
350 7th Avenue
Floor 10
New York, NY 10001
212.691.1134 x 14
212.691.3477 fax
www.nylontechnology.com
 
"Some people call me the space cowboy. Some people call me the gangster of
love."

-Original Message-
From: Bryan Stevenson [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 10, 2006 12:26 PM
To: CF-Talk
Subject: Re: Reading PDF Content

> Bryan,
>
> I have been following the thread and would be very interested in 
> reading text from a PDF via iText. Is there an example you can point us
to.
>
> Thanks,
> ...
> Ben Nadel

You know what Ben (and Paul H.)I may have mis-spoken on this one.

I swear I saw it when I was buried in iText pre MX 7.but now I can't
find it anywhere (although I see the examples are WAY better)

I may have confused the ability to read a PDF with the ability to copy a PDF
into another PDF (thus assuming reading was possible).

Sorry for any confusion folksI'll be sure to flog myself later ;-)

Cheers

Bryan Stevenson B.Comm.
VP & Director of E-Commerce Development
Electric Edge Systems Group Inc.
phone: 250.480.0642
fax: 250.480.1264
cell: 250.920.8830
e-mail: [EMAIL PROTECTED]
web: www.electricedgesystems.com 




~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245957
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-10 Thread Bryan Stevenson
> Bryan,
>
> I have been following the thread and would be very interested in reading
> text from a PDF via iText. Is there an example you can point us to.
>
> Thanks,
> ...
> Ben Nadel

You know what Ben (and Paul H.)I may have mis-spoken on this one.

I swear I saw it when I was buried in iText pre MX 7.but now I can't find 
it 
anywhere (although I see the examples are WAY better)

I may have confused the ability to read a PDF with the ability to copy a PDF 
into another PDF (thus assuming reading was possible).

Sorry for any confusion folksI'll be sure to flog myself later ;-)

Cheers

Bryan Stevenson B.Comm.
VP & Director of E-Commerce Development
Electric Edge Systems Group Inc.
phone: 250.480.0642
fax: 250.480.1264
cell: 250.920.8830
e-mail: [EMAIL PROTECTED]
web: www.electricedgesystems.com 


~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245952
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


RE: Reading PDF Content

2006-07-10 Thread Ben Nadel
Bryan,

I have been following the thread and would be very interested in reading
text from a PDF via iText. Is there an example you can point us to.

Thanks,
...
Ben Nadel 
Web Developer
Nylon Technology
350 7th Avenue
Floor 10
New York, NY 10001
212.691.1134 x 14
212.691.3477 fax
www.nylontechnology.com
 
"Some people call me the space cowboy. Some people call me the gangster of
love."

-Original Message-
From: Bryan Stevenson [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 10, 2006 11:56 AM
To: CF-Talk
Subject: Re: Reading PDF Content

> >iTextlook on SourceForge
>
> sorry no. itext doesn't have this kind of functionality. i guess it's 
> pdfbox but i'm not sure exactly how sure-fire it is. i've always 
> understood PDFs to be one-way. i guess i should look into that.

Huhyes iText does allow for the reading of text from within a PDF
doc

Bryan Stevenson B.Comm.
VP & Director of E-Commerce Development
Electric Edge Systems Group Inc.
phone: 250.480.0642
fax: 250.480.1264
cell: 250.920.8830
e-mail: [EMAIL PROTECTED]
web: www.electricedgesystems.com 




~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245948
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-10 Thread Bryan Stevenson
> >iTextlook on SourceForge
>
> sorry no. itext doesn't have this kind of functionality. i guess it's pdfbox 
> but i'm not sure exactly how sure-fire it is. i've always understood PDFs to 
> be one-way. i guess i should look into that.

Huhyes iText does allow for the reading of text from within a PDF doc

Bryan Stevenson B.Comm.
VP & Director of E-Commerce Development
Electric Edge Systems Group Inc.
phone: 250.480.0642
fax: 250.480.1264
cell: 250.920.8830
e-mail: [EMAIL PROTECTED]
web: www.electricedgesystems.com 


~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245943
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-08 Thread Paul Hastings
>We didn't do any specific >We didn't do any specific Unicode testing (nor any 
>truly exhaustive

simple test of a PDF produced from my usual unicode testbed: 

http://www.sustainablegis.com/unicode/

produced some strange results. 

first off it seems you have to *know* the encoding of the PDF (at least i 
didn't see any methods for examining this). i guess not a big deal as 
everybody's using unicode, right ;-)

next when i stripped out the text, it handled pretty much all of the unicode 
text (after i told it to use utf-8) including the hebrew text which is BIDI. 
but it 99% flubbed the arabic & farsi (also BIDI). it did get the arabic 
question marks right (as well as it's directionality) but it turned the rest of 
the text into "nullnullnull". and surprisingly it also flubbed the korean 
though the rest of CJK worked (including vietnamese which is sometimes also 
lumped into CJK).

i can't see a pattern as to what works & what doesn't. i guess it needs more 
looking into.

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245808
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-08 Thread James Holmes
We didn't do any specific Unicode testing (nor any truly exhaustive
testing, as we went the Oracle route before we got to that stage), but
I certainly recommend it as a starting point for those who need to
test those features. PDF is inherently strange in terms of text order
anyway, as it dosn't treat a document as one flowing stream - it's a
set of blocks of positioned text and even a simple looking doc can
lead to strange results (there's nothing much that can be done about
that). I'm not sure what the implications of that are for
bidirectional text.

On 7/8/06, Paul Hastings <[EMAIL PROTECTED]> wrote:
> James Holmes wrote:
> > PDFBox works fine - I was using it before I swapped to using Oracle to
> > do it as part of our full-text indexing.
>
> "works fine" meaning 100% translation 100% of the time? works w/unicode, BIDI
> stuff, etc.?

-- 
CFAJAX docs and other useful articles:
http://jr-holmes.coldfusionjournal.com/

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245799
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-08 Thread Paul Hastings
James Holmes wrote:
> PDFBox works fine - I was using it before I swapped to using Oracle to
> do it as part of our full-text indexing.

"works fine" meaning 100% translation 100% of the time? works w/unicode, BIDI 
stuff, etc.?

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245793
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: Reading PDF Content

2006-07-08 Thread James Holmes
PDFBox works fine - I was using it before I swapped to using Oracle to
do it as part of our full-text indexing.

On 7/8/06, Paul Hastings <[EMAIL PROTECTED]> wrote:
> >iTextlook on SourceForge
>
> sorry no. itext doesn't have this kind of functionality. i guess it's pdfbox 
> but i'm not sure exactly how sure-fire it is. i've always understood PDFs to 
> be one-way. i guess i should look into that.

-- 
CFAJAX docs and other useful articles:
http://jr-holmes.coldfusionjournal.com/

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245792
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: Reading PDF Content

2006-07-08 Thread Paul Hastings
>iTextlook on SourceForge

sorry no. itext doesn't have this kind of functionality. i guess it's pdfbox 
but i'm not sure exactly how sure-fire it is. i've always understood PDFs to be 
one-way. i guess i should look into that.

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245791
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: Reading PDF Content

2006-07-07 Thread Joe Velez
Cool!

Thanks - I'll keep that in mind. May be immediate, may be a few weeks.


- Original Message - 
From: "Robertson-Ravo, Neil (RX)" <[EMAIL PROTECTED]>
To: "CF-Talk" 
Sent: Friday, July 07, 2006 12:37 PM
Subject: Re: Reading PDF Content


> I have done quite a lot of work with iText and ColdFusion, anything I can
> add - be happy to.
>
>
>
>
> "This e-mail is from Reed Exhibitions (Oriel House, 26 The Quadrant,
> Richmond, Surrey, TW9 1DL, United Kingdom), a division of Reed Business,
> Registered in England, Number 678540.  It contains information which is
> confidential and may also be privileged.  It is for the exclusive use of
the
> intended recipient(s).  If you are not the intended recipient(s) please
note
> that any form of distribution, copying or use of this communication or the
> information in it is strictly prohibited and may be unlawful.  If you have
> received this communication in error please return it to the sender or
call
> our switchboard on +44 (0) 20 89107910.  The opinions expressed within
this
> communication are not necessarily those expressed by Reed Exhibitions."
> Visit our website at http://www.reedexpo.com
>
> -Original Message-
> From: Robert Everland III <[EMAIL PROTECTED]>
> To: CF-Talk 
> Sent: Fri Jul 07 20:51:25 2006
> Subject: Re: Reading PDF Content
>
> You can download ColdPDF to help you get started in using iText ,
> http://www.reactivevision.com/coldpdf.txt , if you figure out how to pull
> out the text please share the code so I can add it to the CFC.
>
>
>
>
> Bob
>
>
>
> 

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245778
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: Reading PDF Content

2006-07-07 Thread Bryan Stevenson
iTextlook on SourceForge

Bryan Stevenson B.Comm.
VP & Director of E-Commerce Development
Electric Edge Systems Group Inc.
phone: 250.480.0642
fax: 250.480.1264
cell: 250.920.8830
e-mail: [EMAIL PROTECTED]
web: www.electricedgesystems.com

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245745
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-07 Thread Robert Everland III
That would be awesome Neil, so far the functions I have is 

fillInForm(sourcePDF, query, destinationPDF) - This takes a pdf and loops 
through all of the fields in the pdf and matches them with the fields from a 
query to automatically fill them in.

listFields(sourcePDF) - This sends a comma delimited list of the field names in 
the pdf (helps to diagnose issues when trying to match a query up with a pdf). 

writePdfFromTiffs(pdfOutput, tiffList) - This will output a .pdf file 
containing multiple tiff images.

If you have any other functionality that hasn't been included shoot me an email 
off list and I will add them to the CFC and give you credit. I need to start a 
website and have things such as setup and what to do if you're on MX 7. Just 
need the time to do it.



Bob Everland

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245743
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-07 Thread Robertson-Ravo, Neil (RX)
I have done quite a lot of work with iText and ColdFusion, anything I can
add - be happy to.




"This e-mail is from Reed Exhibitions (Oriel House, 26 The Quadrant,
Richmond, Surrey, TW9 1DL, United Kingdom), a division of Reed Business,
Registered in England, Number 678540.  It contains information which is
confidential and may also be privileged.  It is for the exclusive use of the
intended recipient(s).  If you are not the intended recipient(s) please note
that any form of distribution, copying or use of this communication or the
information in it is strictly prohibited and may be unlawful.  If you have
received this communication in error please return it to the sender or call
our switchboard on +44 (0) 20 89107910.  The opinions expressed within this
communication are not necessarily those expressed by Reed Exhibitions." 
Visit our website at http://www.reedexpo.com

-Original Message-
From: Robert Everland III <[EMAIL PROTECTED]>
To: CF-Talk 
Sent: Fri Jul 07 20:51:25 2006
Subject: Re: Reading PDF Content

You can download ColdPDF to help you get started in using iText ,
http://www.reactivevision.com/coldpdf.txt , if you figure out how to pull
out the text please share the code so I can add it to the CFC.




Bob



~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245740
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-07 Thread Robert Everland III
You can download ColdPDF to help you get started in using iText , 
http://www.reactivevision.com/coldpdf.txt , if you figure out how to pull out 
the text please share the code so I can add it to the CFC.




Bob

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245737
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: Reading PDF Content

2006-07-07 Thread Joe Velez
sweet!
Thanks. I'll check them both out.

- Original Message - 
From: "Gareth" <[EMAIL PROTECTED]>
To: "CF-Talk" 
Sent: Friday, July 07, 2006 12:37 PM
Subject: Re: Reading PDF Content


> I've done this with pdfbox, an opensource java pdf reader thing.
>
> http://www.pdfbox.org/
>
> There's a post that mentions it here too:
>
>
http://www.houseoffusion.com/cf_lists/messages.cfm/forumid:4/threadid:29432
>
> - Original Message - 
> From: "Joe Velez" <[EMAIL PROTECTED]>
> To: "CF-Talk" 
> Sent: Friday, July 07, 2006 8:08 PM
> Subject: Reading PDF Content
>
>
> Hi -
>
> I was curious if anyone out there knew of a CF, CFX or other method to
> obtain the [text] content of a PDF file that will run on WIN2K / CFMX6.1
>
> Using CFFILE just returns garbage, and I couldn't find anything useful
> online so far.
>
> Any suggestions would be greatly appreciated.
>
> Thanks
>
>  Joe Velez
>
>
>
> 

~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245735
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-07 Thread Robertson-Ravo, Neil (RX)
iText.





"This e-mail is from Reed Exhibitions (Oriel House, 26 The Quadrant,
Richmond, Surrey, TW9 1DL, United Kingdom), a division of Reed Business,
Registered in England, Number 678540.  It contains information which is
confidential and may also be privileged.  It is for the exclusive use of the
intended recipient(s).  If you are not the intended recipient(s) please note
that any form of distribution, copying or use of this communication or the
information in it is strictly prohibited and may be unlawful.  If you have
received this communication in error please return it to the sender or call
our switchboard on +44 (0) 20 89107910.  The opinions expressed within this
communication are not necessarily those expressed by Reed Exhibitions." 
Visit our website at http://www.reedexpo.com

-Original Message-
From: Joe Velez <[EMAIL PROTECTED]>
To: CF-Talk 
Sent: Fri Jul 07 20:08:03 2006
Subject: Reading PDF Content

Hi -

I was curious if anyone out there knew of a CF, CFX or other method to
obtain the [text] content of a PDF file that will run on WIN2K / CFMX6.1

Using CFFILE just returns garbage, and I couldn't find anything useful
online so far.

Any suggestions would be greatly appreciated.

Thanks

 Joe Velez



~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245734
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Reading PDF Content

2006-07-07 Thread Gareth
I've done this with pdfbox, an opensource java pdf reader thing.

http://www.pdfbox.org/

There's a post that mentions it here too:

http://www.houseoffusion.com/cf_lists/messages.cfm/forumid:4/threadid:29432

- Original Message - 
From: "Joe Velez" <[EMAIL PROTECTED]>
To: "CF-Talk" 
Sent: Friday, July 07, 2006 8:08 PM
Subject: Reading PDF Content


Hi -

I was curious if anyone out there knew of a CF, CFX or other method to 
obtain the [text] content of a PDF file that will run on WIN2K / CFMX6.1

Using CFFILE just returns garbage, and I couldn't find anything useful 
online so far.

Any suggestions would be greatly appreciated.

Thanks

 Joe Velez



~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245732
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4