Ok fixed. This is what I did Right click on the new project ->Debug As-> Debug Configurations ->Source ->Add -> Project Then I selected PDFBox project.
Thanks Dimuthu On Tue, Feb 25, 2014 at 1:17 PM, DImuthu Upeksha <[email protected] > wrote: > I'm using eclipse. This is what I want. I created a new Java application > project (say TestPDFBox) with a main class with following code. > > PDDocument document = new PDDocument();PDPage blankPage = new > PDPage();document.addPage( blankPage > );document.save("BlankPage.pdf");document.close(); > > Then I need to add those jar files generated in target folder of PDFBox to > build path of my new project (I did build the PDFBox project from source). > That is what I did. But let's say I need to check the functionality of > document.save("") method. But I don't have a reference to it's sources > because I directly used generated jars. As Tilman said I built PDFBox from > sources but I don't know a proper way to use it other projects other than > adding those jar files to build path. > > > On Tue, Feb 25, 2014 at 1:03 PM, John Hewson <[email protected]> wrote: > >> Which IDE are you using? You should be able to run the PDFToText class >> (in pdfbox-tools) using your IDE and pass a PDF file path as the command >> line argument. >> >> -- John >> >> > On 24 Feb 2014, at 22:38, DImuthu Upeksha <[email protected]> >> wrote: >> > >> > Hi John, >> > Thanks for the reply. Yes I checked out PDFBox code and managed to build >> > code successfully. I looked at the classes you mentioned and I got a >> rough >> > idea about how they are working. To check them I used the jars in target >> > folder to my separate java project. I tried samples in >> > http://pdfbox.apache.org/cookbook/. I need to further look into code >> > specially how those processXXX() methods work in PDFTextStripper class. >> > What I usually do is adding some berakpoints and checking them in debug >> > windows. But using jars it's not possible. What is the way you follow in >> > order to do such task? >> > >> > As well I installed tesseract in to my machine and managed to do some >> OCR >> > stuff also. That's a cool tool which works fine. >> > I'm still learning the code. If I get any issue I'll drop you a mail. >> > >> > Thanks >> > Dimuthu >> > >> > >> >> On Tue, Feb 25, 2014 at 12:33 AM, John Hewson <[email protected]> >> wrote: >> >> >> >> Hi Dimuthu >> >> >> >> The PDFBox website can be found at http://pdfbox.apache.org/ it >> contains >> >> a basic overview of the project >> >> and details on how to obtain the source code and build PDFBox for >> yourself. >> >> >> >> Currently we do not perform any OCR and PDFBOX-1912 details the only >> >> thoughts so far regarding it. >> >> Note that the OCR libraries mentioned in the JIRA issue are all under >> the >> >> Apache license, which is a >> >> requirement. >> >> >> >> Once you have the source code, take a look at the PageDrawer class to >> see >> >> how text and images are >> >> rendered. We want someone to interface at a low-level (e.g. one glyph, >> >> word, or sentence at a time) with >> >> an OCR engine. Also look at PDFTextStripper which is how text is >> currently >> >> extracted, take a look at how >> >> we have to go to great length to sort text back into reading order and >> >> infer the placement of diacritics - PDF >> >> is fundamentally a visual format, not a structured format like HTML - >> >> which is why extracting text can be so >> >> difficult sometimes. >> >> >> >> The full PDF Reference document can be found at: >> >> >> >> >> http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf >> >> >> >> Feel free to discuss specifics of your proposal or ask any questions. >> >> >> >> Thanks, >> >> >> >> -- John >> >> >> >> On 23 Feb 2014, at 21:13, DImuthu Upeksha <[email protected]> >> >> wrote: >> >> >> >>> Hi, >> >>> I am Dimuthu Upeksha, a Computer Engineering Undergraduate at >> University >> >> of Moratuwa Sri Lanka. I successfully completed my GSoC 2013 with >> Apache >> >> ISIS [1] project. I'm very much interested in OCR and image processing >> >> stuff. So I would like to select this project idea as my GSoC 2014 >> project >> >> because I feel like it is the best suited project for me. In university >> >> also we have done some research in OCR area and our group wrote a >> >> literature review about increasing efficiency of OCR >> systems(attached). Can >> >> you please suggest me where to start learning about PDFBox? >> >>> >> >>> [1] >> >> >> http://google-opensource.blogspot.com/2013/10/google-summer-of-code-veteran-orgs.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+GoogleOpenSourceBlog+%28Google+Open+Source+Blog%29 >> >>> >> >>> Thank you >> >>> Dimuthu >> >>> >> >>> -- >> >>> Regards >> >>> W.Dimuthu Upeksha >> >>> Undergraduate >> >>> Department of Computer Science And Engineering >> >>> University of Moratuwa, Sri Lanka >> > >> > >> > -- >> > Regards >> > >> > W.Dimuthu Upeksha >> > Undergraduate >> > Department of Computer Science And Engineering >> > >> > University of Moratuwa, Sri Lanka >> > > > > -- > Regards > > W.Dimuthu Upeksha > Undergraduate > Department of Computer Science And Engineering > > University of Moratuwa, Sri Lanka > -- Regards W.Dimuthu Upeksha Undergraduate Department of Computer Science And Engineering University of Moratuwa, Sri Lanka
