Re: Reading PDF Content
Bryan Stevenson wrote: > Huhyes iText does allow for the reading of text from within a PDF doc no, bruno & paulo are always having to tell folks "sorry no" for that kind of functionality. check the itext archives for may-22, you'll see bruno's latest. pdfbox "works" for this sort of thing. but what it extracts is kind of strange, i don't have much of a need for this so looking into this some more isn't high on my list. ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245958 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
RE: Reading PDF Content
No worries man. Thanks for looking into it. ... Ben Nadel Web Developer Nylon Technology 350 7th Avenue Floor 10 New York, NY 10001 212.691.1134 x 14 212.691.3477 fax www.nylontechnology.com "Some people call me the space cowboy. Some people call me the gangster of love." -Original Message- From: Bryan Stevenson [mailto:[EMAIL PROTECTED] Sent: Monday, July 10, 2006 12:26 PM To: CF-Talk Subject: Re: Reading PDF Content > Bryan, > > I have been following the thread and would be very interested in > reading text from a PDF via iText. Is there an example you can point us to. > > Thanks, > ... > Ben Nadel You know what Ben (and Paul H.)I may have mis-spoken on this one. I swear I saw it when I was buried in iText pre MX 7.but now I can't find it anywhere (although I see the examples are WAY better) I may have confused the ability to read a PDF with the ability to copy a PDF into another PDF (thus assuming reading was possible). Sorry for any confusion folksI'll be sure to flog myself later ;-) Cheers Bryan Stevenson B.Comm. VP & Director of E-Commerce Development Electric Edge Systems Group Inc. phone: 250.480.0642 fax: 250.480.1264 cell: 250.920.8830 e-mail: [EMAIL PROTECTED] web: www.electricedgesystems.com ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245957 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
> Bryan, > > I have been following the thread and would be very interested in reading > text from a PDF via iText. Is there an example you can point us to. > > Thanks, > ... > Ben Nadel You know what Ben (and Paul H.)I may have mis-spoken on this one. I swear I saw it when I was buried in iText pre MX 7.but now I can't find it anywhere (although I see the examples are WAY better) I may have confused the ability to read a PDF with the ability to copy a PDF into another PDF (thus assuming reading was possible). Sorry for any confusion folksI'll be sure to flog myself later ;-) Cheers Bryan Stevenson B.Comm. VP & Director of E-Commerce Development Electric Edge Systems Group Inc. phone: 250.480.0642 fax: 250.480.1264 cell: 250.920.8830 e-mail: [EMAIL PROTECTED] web: www.electricedgesystems.com ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245952 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
RE: Reading PDF Content
Bryan, I have been following the thread and would be very interested in reading text from a PDF via iText. Is there an example you can point us to. Thanks, ... Ben Nadel Web Developer Nylon Technology 350 7th Avenue Floor 10 New York, NY 10001 212.691.1134 x 14 212.691.3477 fax www.nylontechnology.com "Some people call me the space cowboy. Some people call me the gangster of love." -Original Message- From: Bryan Stevenson [mailto:[EMAIL PROTECTED] Sent: Monday, July 10, 2006 11:56 AM To: CF-Talk Subject: Re: Reading PDF Content > >iTextlook on SourceForge > > sorry no. itext doesn't have this kind of functionality. i guess it's > pdfbox but i'm not sure exactly how sure-fire it is. i've always > understood PDFs to be one-way. i guess i should look into that. Huhyes iText does allow for the reading of text from within a PDF doc Bryan Stevenson B.Comm. VP & Director of E-Commerce Development Electric Edge Systems Group Inc. phone: 250.480.0642 fax: 250.480.1264 cell: 250.920.8830 e-mail: [EMAIL PROTECTED] web: www.electricedgesystems.com ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245948 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
> >iTextlook on SourceForge > > sorry no. itext doesn't have this kind of functionality. i guess it's pdfbox > but i'm not sure exactly how sure-fire it is. i've always understood PDFs to > be one-way. i guess i should look into that. Huhyes iText does allow for the reading of text from within a PDF doc Bryan Stevenson B.Comm. VP & Director of E-Commerce Development Electric Edge Systems Group Inc. phone: 250.480.0642 fax: 250.480.1264 cell: 250.920.8830 e-mail: [EMAIL PROTECTED] web: www.electricedgesystems.com ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245943 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
>We didn't do any specific >We didn't do any specific Unicode testing (nor any >truly exhaustive simple test of a PDF produced from my usual unicode testbed: http://www.sustainablegis.com/unicode/ produced some strange results. first off it seems you have to *know* the encoding of the PDF (at least i didn't see any methods for examining this). i guess not a big deal as everybody's using unicode, right ;-) next when i stripped out the text, it handled pretty much all of the unicode text (after i told it to use utf-8) including the hebrew text which is BIDI. but it 99% flubbed the arabic & farsi (also BIDI). it did get the arabic question marks right (as well as it's directionality) but it turned the rest of the text into "nullnullnull". and surprisingly it also flubbed the korean though the rest of CJK worked (including vietnamese which is sometimes also lumped into CJK). i can't see a pattern as to what works & what doesn't. i guess it needs more looking into. ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245808 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
We didn't do any specific Unicode testing (nor any truly exhaustive testing, as we went the Oracle route before we got to that stage), but I certainly recommend it as a starting point for those who need to test those features. PDF is inherently strange in terms of text order anyway, as it dosn't treat a document as one flowing stream - it's a set of blocks of positioned text and even a simple looking doc can lead to strange results (there's nothing much that can be done about that). I'm not sure what the implications of that are for bidirectional text. On 7/8/06, Paul Hastings <[EMAIL PROTECTED]> wrote: > James Holmes wrote: > > PDFBox works fine - I was using it before I swapped to using Oracle to > > do it as part of our full-text indexing. > > "works fine" meaning 100% translation 100% of the time? works w/unicode, BIDI > stuff, etc.? -- CFAJAX docs and other useful articles: http://jr-holmes.coldfusionjournal.com/ ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245799 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
James Holmes wrote: > PDFBox works fine - I was using it before I swapped to using Oracle to > do it as part of our full-text indexing. "works fine" meaning 100% translation 100% of the time? works w/unicode, BIDI stuff, etc.? ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245793 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: Reading PDF Content
PDFBox works fine - I was using it before I swapped to using Oracle to do it as part of our full-text indexing. On 7/8/06, Paul Hastings <[EMAIL PROTECTED]> wrote: > >iTextlook on SourceForge > > sorry no. itext doesn't have this kind of functionality. i guess it's pdfbox > but i'm not sure exactly how sure-fire it is. i've always understood PDFs to > be one-way. i guess i should look into that. -- CFAJAX docs and other useful articles: http://jr-holmes.coldfusionjournal.com/ ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245792 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: Reading PDF Content
>iTextlook on SourceForge sorry no. itext doesn't have this kind of functionality. i guess it's pdfbox but i'm not sure exactly how sure-fire it is. i've always understood PDFs to be one-way. i guess i should look into that. ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245791 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: Reading PDF Content
Cool! Thanks - I'll keep that in mind. May be immediate, may be a few weeks. - Original Message - From: "Robertson-Ravo, Neil (RX)" <[EMAIL PROTECTED]> To: "CF-Talk" Sent: Friday, July 07, 2006 12:37 PM Subject: Re: Reading PDF Content > I have done quite a lot of work with iText and ColdFusion, anything I can > add - be happy to. > > > > > "This e-mail is from Reed Exhibitions (Oriel House, 26 The Quadrant, > Richmond, Surrey, TW9 1DL, United Kingdom), a division of Reed Business, > Registered in England, Number 678540. It contains information which is > confidential and may also be privileged. It is for the exclusive use of the > intended recipient(s). If you are not the intended recipient(s) please note > that any form of distribution, copying or use of this communication or the > information in it is strictly prohibited and may be unlawful. If you have > received this communication in error please return it to the sender or call > our switchboard on +44 (0) 20 89107910. The opinions expressed within this > communication are not necessarily those expressed by Reed Exhibitions." > Visit our website at http://www.reedexpo.com > > -Original Message- > From: Robert Everland III <[EMAIL PROTECTED]> > To: CF-Talk > Sent: Fri Jul 07 20:51:25 2006 > Subject: Re: Reading PDF Content > > You can download ColdPDF to help you get started in using iText , > http://www.reactivevision.com/coldpdf.txt , if you figure out how to pull > out the text please share the code so I can add it to the CFC. > > > > > Bob > > > > ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245778 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: Reading PDF Content
iTextlook on SourceForge Bryan Stevenson B.Comm. VP & Director of E-Commerce Development Electric Edge Systems Group Inc. phone: 250.480.0642 fax: 250.480.1264 cell: 250.920.8830 e-mail: [EMAIL PROTECTED] web: www.electricedgesystems.com ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245745 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
That would be awesome Neil, so far the functions I have is fillInForm(sourcePDF, query, destinationPDF) - This takes a pdf and loops through all of the fields in the pdf and matches them with the fields from a query to automatically fill them in. listFields(sourcePDF) - This sends a comma delimited list of the field names in the pdf (helps to diagnose issues when trying to match a query up with a pdf). writePdfFromTiffs(pdfOutput, tiffList) - This will output a .pdf file containing multiple tiff images. If you have any other functionality that hasn't been included shoot me an email off list and I will add them to the CFC and give you credit. I need to start a website and have things such as setup and what to do if you're on MX 7. Just need the time to do it. Bob Everland ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245743 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
I have done quite a lot of work with iText and ColdFusion, anything I can add - be happy to. "This e-mail is from Reed Exhibitions (Oriel House, 26 The Quadrant, Richmond, Surrey, TW9 1DL, United Kingdom), a division of Reed Business, Registered in England, Number 678540. It contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you have received this communication in error please return it to the sender or call our switchboard on +44 (0) 20 89107910. The opinions expressed within this communication are not necessarily those expressed by Reed Exhibitions." Visit our website at http://www.reedexpo.com -Original Message- From: Robert Everland III <[EMAIL PROTECTED]> To: CF-Talk Sent: Fri Jul 07 20:51:25 2006 Subject: Re: Reading PDF Content You can download ColdPDF to help you get started in using iText , http://www.reactivevision.com/coldpdf.txt , if you figure out how to pull out the text please share the code so I can add it to the CFC. Bob ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245740 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
You can download ColdPDF to help you get started in using iText , http://www.reactivevision.com/coldpdf.txt , if you figure out how to pull out the text please share the code so I can add it to the CFC. Bob ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245737 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: Reading PDF Content
sweet! Thanks. I'll check them both out. - Original Message - From: "Gareth" <[EMAIL PROTECTED]> To: "CF-Talk" Sent: Friday, July 07, 2006 12:37 PM Subject: Re: Reading PDF Content > I've done this with pdfbox, an opensource java pdf reader thing. > > http://www.pdfbox.org/ > > There's a post that mentions it here too: > > http://www.houseoffusion.com/cf_lists/messages.cfm/forumid:4/threadid:29432 > > - Original Message - > From: "Joe Velez" <[EMAIL PROTECTED]> > To: "CF-Talk" > Sent: Friday, July 07, 2006 8:08 PM > Subject: Reading PDF Content > > > Hi - > > I was curious if anyone out there knew of a CF, CFX or other method to > obtain the [text] content of a PDF file that will run on WIN2K / CFMX6.1 > > Using CFFILE just returns garbage, and I couldn't find anything useful > online so far. > > Any suggestions would be greatly appreciated. > > Thanks > > Joe Velez > > > > ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245735 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
iText. "This e-mail is from Reed Exhibitions (Oriel House, 26 The Quadrant, Richmond, Surrey, TW9 1DL, United Kingdom), a division of Reed Business, Registered in England, Number 678540. It contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you have received this communication in error please return it to the sender or call our switchboard on +44 (0) 20 89107910. The opinions expressed within this communication are not necessarily those expressed by Reed Exhibitions." Visit our website at http://www.reedexpo.com -Original Message- From: Joe Velez <[EMAIL PROTECTED]> To: CF-Talk Sent: Fri Jul 07 20:08:03 2006 Subject: Reading PDF Content Hi - I was curious if anyone out there knew of a CF, CFX or other method to obtain the [text] content of a PDF file that will run on WIN2K / CFMX6.1 Using CFFILE just returns garbage, and I couldn't find anything useful online so far. Any suggestions would be greatly appreciated. Thanks Joe Velez ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245734 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Reading PDF Content
I've done this with pdfbox, an opensource java pdf reader thing. http://www.pdfbox.org/ There's a post that mentions it here too: http://www.houseoffusion.com/cf_lists/messages.cfm/forumid:4/threadid:29432 - Original Message - From: "Joe Velez" <[EMAIL PROTECTED]> To: "CF-Talk" Sent: Friday, July 07, 2006 8:08 PM Subject: Reading PDF Content Hi - I was curious if anyone out there knew of a CF, CFX or other method to obtain the [text] content of a PDF file that will run on WIN2K / CFMX6.1 Using CFFILE just returns garbage, and I couldn't find anything useful online so far. Any suggestions would be greatly appreciated. Thanks Joe Velez ~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/cf_lists/message.cfm/forumid:4/messageid:245732 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4