"the J2ME CLDC library does the provides support for the jr6 features like generics, Datastructures like List, Collection, Set,Map,Iterator. They just can work with Vector, Hashtable and enumeration." Did you mean the J2ME CLDC does NOT provide support for...?
The ISO for PDFs is ISO32000, however once you start working with real-world PDFs, you will quickly realize that many of them (perhaps even the majority) do not strictly conform to the standard. We try to handle these non-conforming PDFs as gracefully as possible, but it can be difficult, especially if it's a bug in some other software that we haven't encountered yet. If you run text extraction on a PDF and don't have any error or warning messages, it's likely conforming (or at least close enough that PDFBox can handle it). When you start testing, it'd be best to find some small, conforming PDFs with standard fonts. Once you've confirmed these work, you can move on to the more difficult test cases. Also, some PDFs do not contain "text" in the traditional sense. Instead they contain images of characters which can not easily be mapped back to ASCII nor UNICODE. The only way to extract text from these documents is to use OCR. PDFBox doesn't currently have any OCR, so you'll have to extract the page as an image and then use some other library to do the OCR. As always with OCR, your results may vary. Luckily these are not that common and seem to be even rarer as time goes on. We can certainly help you troubleshoot any issues you run into and help you contribute patches to make PDFBox easily run on Android and Blackberry. If you have a choice in the matter, I'd suggest focusing on Android first since it sounds like it'll be much easier. It will the quickest route to having a functioning product and you will be able to show your employer/investors/customers that you are making progress. ---- Thanks, Adam From: harmanpreet singh <harman....@gmail.com> To: dev@pdfbox.apache.org Date: 03/15/2011 21:03 Subject: Re: PDFBox for Hand Held devices Hi Adam, Actually I was trying to make the printing application for blackberry where I wanted to print the native pdf file from the blackberry. So I wanted to extract textual data from the pdf. For this purpose, I don't want to use any server side solution like setting up my own BES (Blackberry Enterprise Server) or web server. In this way the solution will become too much costly and less efficient. So I have decided to extract text from pdf within the blackberry device. As for the Android, It is acutally a mixture of google api on top of jre. So It supports the basics features of jre like generics and collection framework, security classes etc. Whereas Blackberry the mixture of Blackberry API on top of J2ME CLDC and MIDP libraries. Now the J2ME CLDC library does the provides support for the jr6 features like generics, Datastructures like List, Collection, Set,Map,Iterator. They just can work with Vector, Hashtable and enumeration. Hence this PDFBox library when compiled with desktop jre and also with android it compiled successfully with a bit of effort for later. But this library isn't working with J2ME and J2ME associated technologies for the simple reasons that i have explained above. I actually have started working on it. I am about to achieve 100% success in converting the low level PDFBox's COS API's to compile successfully on blackberry. But the problem here is that I don't have any idea how this library was developed. Through I know how this library works deeply. So what i wanted from the developers of PDFBox library to give me the paper plan of the library that they followed for the ISO standard mentioned for making PDF. Hence I should know whether or not I am working on a right direction. Right now, I don't even know whether my solution will work properly or not. So I want to match my progress step by step with desktop version of PDFBox library during its development phases. This will make easier progress easier and traceable. If possible, I wanted the help from developers of this community to give some helping hand to make this happen. regards, Harman On Tue, Mar 15, 2011 at 10:37 PM, <a...@swmc.com> wrote: > Harmanpreet, > > Welcome to the PDFBox community. I agree that it'd be great to have > PDFBox run on mobile devices. There has actually been talk about this > before. If you haven't already read PDFBOX-586[1], I would encourage you > to check this out (specifically the comments). You'll find that PDFBox > does run on Android, but there's plenty of room for improvement. > > I'm short on time, I don't have an Android phone and last time I tried > running the emulator it failed to boot (no errors, just sat there eating > up all the CPU cycles), so I don't know how much I'll be able to > contribute. But I remember one of the issues brought up was the size of > the jar file and memory limitations of small devices. However, each > dependency removed will drop some features. For example, dropping icu4j > will mean "right to left" languages like Arabic won't be supported, or > removing bouncy castle will mean that encrypted PDFs will not be supported > (note: no all encrypted documents require a password and these are fairly > common). > > I like Andreas's idea on splitting it into modules. That way you can pick > and choose the options you want, and the other options will not be > consuming memory/drive space. This will also benefit desktop users > because using less memory is always a good thing. > > [1] https://issues.apache.org/jira/browse/PDFBOX-586 > > ---- > Thanks, > Adam > > > > > > From: > harmanpreet singh <harman....@gmail.com> > To: > dev@pdfbox.apache.org > Date: > 03/15/2011 00:34 > Subject: > PDFBox for Hand Held devices > > > > Hi, > > I have newly joined this developer forum and I have a proposal for all of > PDFBox developers. The PDFBox till date whatever is developed is a desktop > solution. But in this era of mobile technology. I think we need to make > this > PDFBox compatible and tested with J2ME platform, in particular it must be > compatible with various Mobile Operating Systems like blackberry, android, > Symbian, Windows. With the help of such a tool for mobile the developers > will be able to make such application with following features for mobile > like > Creating Pdf file in mobile. > Editing Pdf file in mobile. > Able to attach the videos and other things with pdf in mobile. > Filling up forms in pdf in mobile. > Viewing Pdf in mobile > Audio Pdf reader in mobile. > > All this will be possible only inside the mobile. No external tool or > solution will be required to do so. > Hence I think this is the best way to this PDF Box library forward. If > this > library is compared with other PDF libraries like itext and others. PDFBox > is a way versatile library than others because it can almost do anything > with PDF, but only on desktop. So my proposal is to take it one step ahead > and make it to be used in any hand held device in this universe using PDF. > > So those of developers interested in this new thinking can kindly mail me > what they can offer to this proposal if they are interested. > I hope everybody understands that It is a process of taking Pdf technology > and PDFBox one step forward. > > regards, > with passion for development, > Harmanpreet Singh > > > > > > - FHA 203b; 203k; HECM; VA; USDA; Conventional > - Warehouse Lines; FHA-Authorized Originators > - Lending and Servicing in over 45 States > www.swmc.com - www.simplehecmcalculator.com > Visit www.swmc.com/resources for helpful links on Training, Webinars, > Lender Alerts and Submitting Conditions > > This email and any content within or attached hereto from Sun West Mortgage > Company, Inc. is confidential and/or legally privileged. The information is > intended only for the use of the individual or entity named on this email. > If you are not the intended recipient, you are hereby notified that any > disclosure, copying, distribution or taking any action in reliance on the > contents of this email information is strictly prohibited, and that the > documents should be returned to this office immediately by email. Receipt by > anyone other than the intended recipient is not a waiver of any privilege. > Please do not include your social security number, account number, or any > other personal or financial information in the content of the email. Should > you have any questions, please call (800) 453 7884. - FHA 203b; 203k; HECM; VA; USDA; Conventional - Warehouse Lines; FHA-Authorized Originators - Lending and Servicing in over 45 States www.swmc.com - www.simplehecmcalculator.com Visit www.swmc.com/resources for helpful links on Training, Webinars, Lender Alerts and Submitting Conditions This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.