Hi Dimuthu This is a good start. One point to address is that a String in Java is encoded as UTF-16, so your getUTF8Text() method must be doing something wrong. It should perform a UTF-16 conversion internally and be renamed to getText(). You can probably do the conversion in Java rather than in C++ (or maybe Tesseract can return UTF-16?).
Cheers -- John On 16 Mar 2014, at 06:15, DImuthu Upeksha <[email protected]> wrote: > Hi John, > > For now I'm using those methods to debug the wrapper. I'll remove > those methods after I finished testing it. > > I started implementing OCR-plugin [1] for PDFBox. Currently it > satisfies basic requirements such as getting word+location data [2]. > Please have a look at that and let me know if any changes are > required. > > [1] https://github.com/DImuthuUpe/OCR-Plugin > [2] > https://github.com/DImuthuUpe/OCR-Plugin/blob/master/src/main/java/org/apache/pdfbox/ocr/OCRConnector.java > > Thanks > Dimuthu > > On Fri, Mar 14, 2014 at 12:09 AM, John Hewson <[email protected]> wrote: >> Thanks, I saw your new refactoring too, it's good. Now the following methods >> are no longer needed: >> >> public void setImagePath(String path) >> public void setImage(byte[] imagedata, int width, int height, int bpp,int >> bpl) >> >> Cheers >> >> -- John >> >> On 11 Mar 2014, at 22:58, DImuthu Upeksha <[email protected]> wrote: >> >>> Hi John, >>> Yes. I implemented a new method to accept byte streams of the image as >>> an input. We directly can't send BufferedImage objects to native side. >>> So what I did is converting buffered image into a byte array and >>> passed it in to native side. At the native side it again converts in >>> to compatible format. With that request we need to pass some metadata >>> of byte stream like image width, height, bytes per pixel and bytes per >>> row. I checked it with this [2] test case and it works fine. >>> >>> [1] >>> https://github.com/DImuthuUpe/Tesseract-API/blob/master/src/main/java/com/apache/pdfbox/ocr/tesseract/TessBaseAPI.java#L74 >>> [2] >>> https://github.com/DImuthuUpe/Tesseract-API/blob/master/src/test/java/com/apache/pdfbox/ocr/tesseract/TessByteSteamTest.java >>> >>> Thanks >>> Dimuthu >>> >>> On Wed, Mar 12, 2014 at 12:40 AM, John Hewson <[email protected]> wrote: >>>> Hi Dimuthu >>>> >>>> The Tesseract wrapper needs to take its input from a BufferedImage rather >>>> than reading a file from disk, so instead of: >>>> >>>> api.setImagePath("test.tif"); >>>> >>>> What we need is: >>>> >>>> BufferedImage image = ImageIO.read(new File("test.tif")); >>>> api.setImagePath(image); >>>> >>>> Because this will let us used the BufferedImage generated by PDFRenderer >>>> without round-tripping to the disk. >>>> >>>> -- John >>>> >>>> On 11 Mar 2014, at 11:13, DImuthu Upeksha <[email protected]> >>>> wrote: >>>> >>>>> Hi John, >>>>> Thanks for the guidance. >>>>> I did a small analysis of the accuracy and performance of new >>>>> Tesseract wrapper. I used this [1] image as the input image and got >>>>> following data [2] after OCR. First line is the recognised word >>>>> followed by location details (bounding box) of the word. I think these >>>>> details are pretty much enough for our task. Now what remaining is >>>>> converting pdf file into a image as you have mentioned. These days I'm >>>>> working on it. >>>>> >>>>> [1] https://www.dropbox.com/s/11wahtonoz08zmn/image4.TIF >>>>> [2] https://gist.github.com/DImuthuUpe/9491660 >>>>> >>>>> Thanks >>>>> Dimuthu >>>>> >>>>> On Mon, Mar 10, 2014 at 2:30 PM, John Hewson <[email protected]> wrote: >>>>>> Dimuthu, >>>>>> >>>>>>> I finished basic implementation of JNI wrapper for Tesseract. Now it >>>>>>> can be >>>>>>> build using maven. Some useful methods that are needed to do basic OCR >>>>>>> were >>>>>>> implemented. >>>>>> >>>>>> Great, it's looking good, nice and clean. >>>>>> >>>>>>> 1. What is the task of processStream method in PDFTextStripper class >>>>>>> line >>>>>>> 456 : processStream( page.findResources(), content, page.findCropBox(), >>>>>>> page.findRotation() ); >>>>>> >>>>>> A PDF file is made up of pages, each of which contains a "content >>>>>> stream". This content stream contains a list of drawing commands such as >>>>>> "move to 10,15" or "write the word `foo`", these are called operators. >>>>>> The processStream function reads the stream for the current page and >>>>>> executes each of the operators. The operators themselves are implemented >>>>>> each in their own class which is a subclass of PDFOperator. The >>>>>> constructor of PDFStreamEngine creates the operator classes using >>>>>> reflection, which is rather odd and I'm not sure why this design was >>>>>> chosen. The operators used by PDFTextStripper can be found in >>>>>> org/apache/pdfbox/resources/PDFTextStripper.properties >>>>>> >>>>>>> 2. Say I need to extract images and it's metadata from a pdf. What is >>>>>>> the better approach to do it? >>>>>> >>>>>> You could subclass PDFTextStripper and override the startDocument method >>>>>> and use it to create a PDFRenderer and store it in a field. Then >>>>>> override the processPage method and use the previously created >>>>>> PDFRenderer to render the current page to a buffered image and perform >>>>>> OCR on the image. Once you have the OCR text + positions, instead of >>>>>> calling processStream you can call processTextPosition once for each >>>>>> character + position. >>>>>> >>>>>> The PDFRenderer class was just added to the trunk, so make sure you do >>>>>> an "svn update". Let me know if you need me to change PDFTextStripper to >>>>>> make it easier to subclass. >>>>>> >>>>>> Cheers >>>>>> >>>>>> -- John >>>>>> >>>>>> On 9 Mar 2014, at 09:08, DImuthu Upeksha <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi John, >>>>>>> I finished basic implementation of JNI wrapper for Tesseract. Now it >>>>>>> can be >>>>>>> build using maven. Some useful methods that are needed to do basic OCR >>>>>>> were >>>>>>> implemented. >>>>>>> >>>>>>> I went through PDFBox code several times and got couple of issues that >>>>>>> are >>>>>>> needed to be clarified >>>>>>> >>>>>>> 1. What is the task of processStream method in PDFTextStripper class >>>>>>> line >>>>>>> 456 : processStream( page.findResources(), content, page.findCropBox(), >>>>>>> page.findRotation() ); >>>>>>> >>>>>>> 2. Say I need to extract images and it's metadata from a pdf. What is >>>>>>> the >>>>>>> better approach to do it? >>>>>>> >>>>>>> Thanks >>>>>>> Dimuthu >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 7, 2014 at 9:26 PM, DImuthu Upeksha >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> Hi John >>>>>>>> I refactored Tesseract JNI code to support maven build. To create the >>>>>>>> JNI >>>>>>>> library I added pre-built static libraries of Tesseract and Leptonica >>>>>>>> to >>>>>>>> resources folder[2]. For now it includes librararies supported for >>>>>>>> mac. But >>>>>>>> we can easily add both windows and linux libraries. After "mvn clean >>>>>>>> install", the jar is created under target folder. Now all setting up is >>>>>>>> done. What remains is implementing those native methods in >>>>>>>> tessbaseapi.cpp >>>>>>>> [3]. Hope to finish it asap. Please let me know if there is any concern >>>>>>>> about project structure. >>>>>>>> >>>>>>>> [1] https://github.com/DImuthuUpe/Tesseract-API.git >>>>>>>> [2] >>>>>>>> https://github.com/DImuthuUpe/Tesseract-API/tree/master/src/main/resources >>>>>>>> [3] >>>>>>>> https://github.com/DImuthuUpe/Tesseract-API/blob/master/src/main/native/src/tessbaseapi.cpp >>>>>>>> >>>>>>>> Thanks >>>>>>>> Dimuthu >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Mar 6, 2014 at 1:15 AM, John Hewson <[email protected]> wrote: >>>>>>>> >>>>>>>>> Dimuthu >>>>>>>>> >>>>>>>>>> There is a lot of code >>>>>>>>>> fractions in current android jni wrapper which use >>>>>>>>>> "(jint)somePointer" >>>>>>>>>> casting which will create terrible memory leaks in 64 bit >>>>>>>>>> environments >>>>>>>>>> because ponters are 64 bit. So I believe writing it from the >>>>>>>>>> beginning >>>>>>>>> is >>>>>>>>>> much better. >>>>>>>>> >>>>>>>>> That's a classic 64-bit pitfall, well spotted. We definitely need to >>>>>>>>> support >>>>>>>>> 64-bit JVMs. >>>>>>>>> >>>>>>>>>> we can use >>>>>>>>>> the static library of Leptonica (I did and it worked nicely). I think >>>>>>>>> it is >>>>>>>>>> not a issue to use it's static library because both Tesseract and >>>>>>>>> Leptonica >>>>>>>>>> is under apache licence. >>>>>>>>> >>>>>>>>> Sounds good, I found the following in the README: >>>>>>>>> >>>>>>>>> Leptonica is required. (www.leptonica.com). Tesseract no longer >>>>>>>>> compiles >>>>>>>>> without Leptonica. >>>>>>>>> >>>>>>>>> Which makes sense. >>>>>>>>> >>>>>>>>> -- John >>>>>>>>> >>>>>>>>> On 5 Mar 2014, at 09:45, DImuthu Upeksha <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi John, >>>>>>>>>> +1 for you suggestion about converting image <=> byte array at java >>>>>>>>> side. >>>>>>>>>> It reduces lot of complexities. I don't know whether you have >>>>>>>>>> noticed or >>>>>>>>>> not, jint data type in jni is a 32bit integer type. I noticed it in >>>>>>>>>> my >>>>>>>>> Mac >>>>>>>>>> but don't know about other operating systems. >>>>>>>>>> >>>>>>>>>> Leptonica is the image processing library for Tesseract [1]. What >>>>>>>>> tesseract >>>>>>>>>> do is using image processing algorithms in Leptonica to implement its >>>>>>>>> OCR >>>>>>>>>> algorithms. This [2] is the responsible .cpp file to create Tesseract >>>>>>>>> API. >>>>>>>>>> You can see it includes allheaders.h header file which is the main >>>>>>>>> header >>>>>>>>>> file of Leptonoca. So I think it is a must to build Leptonica first >>>>>>>>>> and >>>>>>>>>> link it when we build Tesseract. This is not a big problem if we can >>>>>>>>>> use >>>>>>>>>> the static library of Leptonica (I did and it worked nicely). I think >>>>>>>>> it is >>>>>>>>>> not a issue to use it's static library because both Tesseract and >>>>>>>>> Leptonica >>>>>>>>>> is under apache licence. >>>>>>>>>> >>>>>>>>>> I'm working on the maven implementation you have mentioned and will >>>>>>>>>> get >>>>>>>>>> back to you soon. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Dimuthu >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] https://code.google.com/p/tesseract-ocr/wiki/Compiling >>>>>>>>>> [2] >>>>>>>>>> >>>>>>>>> https://github.com/DImuthuUpe/Tesseract-API/blob/master/jni/tesseract/src/api/tesseractmain.cpp >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 5, 2014 at 1:15 AM, John Hewson <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Dimuthu, >>>>>>>>>>> >>>>>>>>>>> 1,2,3: >>>>>>>>>>> >>>>>>>>>>> Feel free to write your own Tesseract binding or port the existing >>>>>>>>> code as >>>>>>>>>>> you see fit. >>>>>>>>>>> The JNI binding should be minimal, only the methods you require need >>>>>>>>> to be >>>>>>>>>>> wrapped. >>>>>>>>>>> Also, don't forget that some of the interop can be done in Java, for >>>>>>>>>>> example if it is easier >>>>>>>>>>> to convert a BufferedImage to a byte array in Java then do it there >>>>>>>>>>> and >>>>>>>>>>> pass the result >>>>>>>>>>> to JNI rather than writing lots of JNI C++ to achieve the same >>>>>>>>>>> result. >>>>>>>>>>> >>>>>>>>>>> Your GitHub repo looks like a good start, I can make comments there >>>>>>>>>>> as >>>>>>>>>>> things progress. >>>>>>>>>>> >>>>>>>>>>> Is it possible to build Tesseract without leptonica? I was under the >>>>>>>>>>> impression that it was >>>>>>>>>>> used for image i/o only, but I may be misinformed. >>>>>>>>>>> >>>>>>>>>>> 4: The native platform library should be built as part of the Maven >>>>>>>>> build >>>>>>>>>>> for the Tesseract >>>>>>>>>>> wrapper which can be a separate project. The output can be a jar >>>>>>>>>>> file >>>>>>>>>>> which contains the >>>>>>>>>>> native binaries. It should be possible for the jar to contain >>>>>>>>>>> prebuilt >>>>>>>>>>> binaries for all platforms >>>>>>>>>>> but this is something we can worry about later. Right now the goal >>>>>>>>> should >>>>>>>>>>> be to build a jar >>>>>>>>>>> containing just the current platform's native binary and any Java >>>>>>>>> wrapper >>>>>>>>>>> code. >>>>>>>>>>> >>>>>>>>>>> -- John >>>>>>>>>>> >>>>>>>>>>> On 3 Mar 2014, at 16:41, DImuthu Upeksha >>>>>>>>>>> <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi John, >>>>>>>>>>>> >>>>>>>>>>>> I tried to reuse that android jni wrapper for tesseract. Here is my >>>>>>>>>>>> observation >>>>>>>>>>>> >>>>>>>>>>>> 1. This wrapper heavily depends on android image libraries. >>>>>>>>>>>> (android/bitmap.h). Most of the wrapper methods [1] use this >>>>>>>>>>>> library. >>>>>>>>>>>> >>>>>>>>>>>> 2. But I can understand underlying logic in each function. >>>>>>>>>>>> Basically >>>>>>>>> what >>>>>>>>>>>> it does is mapping between tesseract api functions [2] with java >>>>>>>>> methods. >>>>>>>>>>>> In between it does to some image <=> byte array like conversions by >>>>>>>>> using >>>>>>>>>>>> that bitmap libraries in Android >>>>>>>>>>>> >>>>>>>>>>>> 3. There are two ways. 1: We can port it's code to make compatible >>>>>>>>> with >>>>>>>>>>> our >>>>>>>>>>>> environments(linux,windows and mac) which is really painful. Also >>>>>>>>>>>> it >>>>>>>>> will >>>>>>>>>>>> cause memory leaks. 2: We can use only it's function signatures and >>>>>>>>>>>> implement using our codes >>>>>>>>>>>> >>>>>>>>>>>> I think 2nd solution is better because we need only few operations >>>>>>>>>>>> to >>>>>>>>> be >>>>>>>>>>>> done using tesseract library. I have created a github repo [3] for >>>>>>>>> this. >>>>>>>>>>>> It's still not finished. I need to add some make files and build >>>>>>>>> files to >>>>>>>>>>>> make it run properly. And also I need to implement those wrapper >>>>>>>>>>> functions >>>>>>>>>>>> [3]. This may take some time. >>>>>>>>>>>> >>>>>>>>>>>> 4. Because we are calling native libraries we need different >>>>>>>>>>>> builds of >>>>>>>>>>>> tesseract and leptonica libraries for each platform (dll for >>>>>>>>>>>> windows, >>>>>>>>> so >>>>>>>>>>>> for linux, dylib for mac). So we may need to build those libraries >>>>>>>>>>>> at >>>>>>>>> the >>>>>>>>>>>> time we build pdfbox project. Or we can pre build those libraries >>>>>>>>>>>> and >>>>>>>>> add >>>>>>>>>>>> them to the project as .dll, .so or .dylib format. What is the >>>>>>>>> preferred >>>>>>>>>>>> way? >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://code.google.com/p/tesseract-android-tools/source/browse/tesseract-android-tools/jni/com_googlecode_tesseract_android/tessbaseapi.cpp >>>>>>>>>>>> [2] https://code.google.com/p/tesseract-ocr/wiki/APIExample >>>>>>>>>>>> [3] https://github.com/DImuthuUpe/Tesseract-API >>>>>>>>>>>> [4] >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://github.com/DImuthuUpe/Tesseract-API/blob/master/jni/tesseract/tessbaseapi.cpp >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Dimuthu >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Mar 1, 2014 at 11:39 PM, DImuthu Upeksha < >>>>>>>>>>> [email protected] >>>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I updated necessary changes to the document [1] >>>>>>>>>>>>> >>>>>>>>>>>>> For last two days I had a deep look at this [2] jni wrapper for >>>>>>>>>>> tessaract >>>>>>>>>>>>> api. >>>>>>>>>>>>> Unfortunately this has been designed for Android environment so I >>>>>>>>> think >>>>>>>>>>> we >>>>>>>>>>>>> need to write our own make files to build this in to a >>>>>>>>>>>>> dll(windows) >>>>>>>>> or >>>>>>>>>>>>> dylib(in mac). Currently it has Android.mk files [3]. I'm >>>>>>>>>>>>> searching >>>>>>>>> for >>>>>>>>>>> a >>>>>>>>>>>>> way to convert it to a make file that we can run on console. >>>>>>>>>>>>> Please >>>>>>>>>>> suggest >>>>>>>>>>>>> if you have a better approach >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://www.dropbox.com/s/9qclvq26divwr2q/Optical%20Character%20Recognition%20for%20PDFBox%20-%20updated.pdf >>>>>>>>>>>>> [2] >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://code.google.com/p/tesseract-android-tools/source/browse/tesseract-android-tools/jni/com_googlecode_tesseract_android/ >>>>>>>>>>>>> [3] >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://code.google.com/p/tesseract-android-tools/source/browse/tesseract-android-tools/jni/com_googlecode_tesseract_android/Android.mk >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Mar 1, 2014 at 12:27 AM, John Hewson <[email protected]> >>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> This is a good start. However, there is no need for the Adder >>>>>>>>>>> component, >>>>>>>>>>>>>> "Extracted Text (OCR) can just feed back into the PDFBox "Text >>>>>>>>>>> Extractor". >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe show a "PDF" file feeding in to "Text Extractor, to make it >>>>>>>>> clear >>>>>>>>>>>>>> where the process starts. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- John >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 26 Feb 2014, at 16:53, DImuthu Upeksha < >>>>>>>>> [email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sorry for the mistake. I added it to my Dropbox [1]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://www.dropbox.com/s/y3m15rfjmw4eqij/Optical%20Character%20Recognition%20for%20PDFBox.pdf >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> Dimuthu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Feb 27, 2014 at 4:44 AM, John Hewson <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I should add that the OCR engine should be pluggable so >>>>>>>>>>>>>>>> PDFToText >>>>>>>>>>> might >>>>>>>>>>>>>>>> use an interface, e.g. OCREngine and there will be a >>>>>>>>>>> TesseractOCREngine >>>>>>>>>>>>>>>> class somewhere which provides the required functionality and >>>>>>>>> lives >>>>>>>>>>> in >>>>>>>>>>>>>> a >>>>>>>>>>>>>>>> separate jar file. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 25 Feb 2014, at 20:18, Dimuthu <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So do you need to embed those new functionalities into >>>>>>>>>>>>>>>>> existing >>>>>>>>>>>>>>>> PDFtoText algorithms or package them as a new sub >>>>>>>>>>>>>>>> system(something >>>>>>>>>>>>>> like an >>>>>>>>>>>>>>>> API)? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>>>>> From: "John Hewson" <[email protected]> >>>>>>>>>>>>>>>>> Sent: 26/02/2014 07:38 >>>>>>>>>>>>>>>>> To: "[email protected]" <[email protected]> >>>>>>>>>>>>>>>>> Subject: Re: [GSoC 2014]Optical Character Recognition project >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> Introduction >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, exactly. By location data I just mean (x,y) coordinates >>>>>>>>>>>>>>>>> and >>>>>>>>>>> page >>>>>>>>>>>>>>>> rotation. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There is another use case for OCR: some fonts embedded in PDFs >>>>>>>>> have >>>>>>>>>>>>>>>> corrupt encodings, which means the ACSII codes map to the wrong >>>>>>>>>>>>>> glyphs. We >>>>>>>>>>>>>>>> could OCR the glyphs to repair the encoding. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- John >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 25 Feb 2014, at 17:13, DImuthu Upeksha < >>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>>>>> Thanks for the explanation. >>>>>>>>>>>>>>>>>> Let's say there is a pdf with both text in extractable format >>>>>>>>> and >>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>> images with text(Scanned images). In that case first we >>>>>>>>>>>>>>>>>> extract >>>>>>>>>>> those >>>>>>>>>>>>>>>>>> extractable content using PDFBox algorithms and rest is >>>>>>>>> extracted >>>>>>>>>>>>>> using >>>>>>>>>>>>>>>>>> OCR. Finally we pack both results together and give output as >>>>>>>>>>>>>>>> PDFToText. Am >>>>>>>>>>>>>>>>>> I correct? What do you mean by "location data"? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>> Dimuthu >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Feb 25, 2014 at 11:22 PM, John Hewson < >>>>>>>>> [email protected]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1. What is called "glyphs" ? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> http://en.wikipedia.org/wiki/Glyph >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2. What is the main requirement of this project? >>>>>>>>>>>>>>>>>>>> As far as I understood, first we need to generate an image >>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>> malformed pdfs from >>>>>>>>>>>>>>>>>>>> PDFBox and then we need to do processing using OCR for >>>>>>>>>>>>>>>>>>>> further >>>>>>>>>>>>>>>> accurate >>>>>>>>>>>>>>>>>>>> results. But the problem is, why shouldn't we directly do >>>>>>>>> OCR on >>>>>>>>>>>>>>>> those >>>>>>>>>>>>>>>>>>>> PDFs without getting output from PDFBox? Correct me if I'm >>>>>>>>> wrong. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> PDFBox can generate images (PDFToImage) and can extract text >>>>>>>>>>>>>>>> (PDFToText). >>>>>>>>>>>>>>>>>>> The goal of >>>>>>>>>>>>>>>>>>> this project is to enhance PDFToText so that it can use OCR >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>> extract >>>>>>>>>>>>>>>>>>> text from areas of the >>>>>>>>>>>>>>>>>>> document where the text is embedded as an image. Such PDF >>>>>>>>>>>>>>>>>>> files >>>>>>>>>>> are >>>>>>>>>>>>>>>>>>> typically generated by >>>>>>>>>>>>>>>>>>> scanners or fax machines. There is also another case where >>>>>>>>>>>>>>>>>>> OCR >>>>>>>>> is >>>>>>>>>>>>>>>> useful: >>>>>>>>>>>>>>>>>>> some fonts embedded >>>>>>>>>>>>>>>>>>> in PDF files contain the wrong encoding, so when text is >>>>>>>>> extracted >>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>> PDFToText the result is >>>>>>>>>>>>>>>>>>> nonsense but when drawn with PDFToImage we see the correct >>>>>>>>>>> letters. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Instead of: >>>>>>>>>>>>>>>>>>> PDF => Image => OCR => Text >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We want to do: >>>>>>>>>>>>>>>>>>> PDF => (Many images for words + location data => OCR) => >>>>>>>>>>>>>>>>>>> Text >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- John >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Feb 25, 2014 at 1:35 PM, DImuthu Upeksha < >>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Ok fixed. This is what I did >>>>>>>>>>>>>>>>>>>>> Right click on the new project ->Debug As-> Debug >>>>>>>>> Configurations >>>>>>>>>>>>>>>>>>> ->Source >>>>>>>>>>>>>>>>>>>>> ->Add -> Project >>>>>>>>>>>>>>>>>>>>> Then I selected PDFBox project. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>> Dimuthu >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Tue, Feb 25, 2014 at 1:17 PM, DImuthu Upeksha < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I'm using eclipse. This is what I want. I created a new >>>>>>>>>>>>>>>>>>>>>> Java >>>>>>>>>>>>>>>>>>> application >>>>>>>>>>>>>>>>>>>>>> project (say TestPDFBox) with a main class with following >>>>>>>>> code. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> PDDocument document = new PDDocument();PDPage blankPage = >>>>>>>>> new >>>>>>>>>>>>>>>>>>> PDPage();document.addPage( blankPage >>>>>>>>>>>>>>>>>>> );document.save("BlankPage.pdf");document.close(); >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Then I need to add those jar files generated in target >>>>>>>>> folder >>>>>>>>>>> of >>>>>>>>>>>>>>>> PDFBox >>>>>>>>>>>>>>>>>>>>>> to build path of my new project (I did build the PDFBox >>>>>>>>> project >>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>>> source). That is what I did. But let's say I need to >>>>>>>>>>>>>>>>>>>>>> check >>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> functionality of document.save("") method. But I don't >>>>>>>>>>>>>>>>>>>>>> have >>>>>>>>> a >>>>>>>>>>>>>>>>>>> reference to >>>>>>>>>>>>>>>>>>>>>> it's sources because I directly used generated jars. As >>>>>>>>> Tilman >>>>>>>>>>>>>> said >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>> built >>>>>>>>>>>>>>>>>>>>>> PDFBox from sources but I don't know a proper way to use >>>>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>> other >>>>>>>>>>>>>>>>>>> projects >>>>>>>>>>>>>>>>>>>>>> other than adding those jar files to build path. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 25, 2014 at 1:03 PM, John Hewson < >>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Which IDE are you using? You should be able to run the >>>>>>>>>>> PDFToText >>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>>>>>> (in pdfbox-tools) using your IDE and pass a PDF file >>>>>>>>>>>>>>>>>>>>>>> path >>>>>>>>> as >>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> command >>>>>>>>>>>>>>>>>>>>>>> line argument. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- John >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On 24 Feb 2014, at 22:38, DImuthu Upeksha < >>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the reply. Yes I checked out PDFBox code and >>>>>>>>>>>>>> managed to >>>>>>>>>>>>>>>>>>>>>>> build >>>>>>>>>>>>>>>>>>>>>>>> code successfully. I looked at the classes you >>>>>>>>>>>>>>>>>>>>>>>> mentioned >>>>>>>>> and >>>>>>>>>>> I >>>>>>>>>>>>>>>> got a >>>>>>>>>>>>>>>>>>>>>>> rough >>>>>>>>>>>>>>>>>>>>>>>> idea about how they are working. To check them I used >>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>> jars >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>> target >>>>>>>>>>>>>>>>>>>>>>>> folder to my separate java project. I tried samples in >>>>>>>>>>>>>>>>>>>>>>>> http://pdfbox.apache.org/cookbook/. I need to further >>>>>>>>> look >>>>>>>>>>>>>> into >>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>>> specially how those processXXX() methods work in >>>>>>>>>>>>>> PDFTextStripper >>>>>>>>>>>>>>>>>>> class. >>>>>>>>>>>>>>>>>>>>>>>> What I usually do is adding some berakpoints and >>>>>>>>>>>>>>>>>>>>>>>> checking >>>>>>>>>>> them >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> debug >>>>>>>>>>>>>>>>>>>>>>>> windows. But using jars it's not possible. What is the >>>>>>>>>>>>>>>>>>>>>>>> way >>>>>>>>>>> you >>>>>>>>>>>>>>>> follow >>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>> order to do such task? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> As well I installed tesseract in to my machine and >>>>>>>>> managed to >>>>>>>>>>>>>> do >>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>>>>> OCR >>>>>>>>>>>>>>>>>>>>>>>> stuff also. That's a cool tool which works fine. >>>>>>>>>>>>>>>>>>>>>>>> I'm still learning the code. If I get any issue I'll >>>>>>>>>>>>>>>>>>>>>>>> drop >>>>>>>>>>> you a >>>>>>>>>>>>>>>> mail. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>>>>> Dimuthu >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 25, 2014 at 12:33 AM, John Hewson < >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Dimuthu >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> The PDFBox website can be found at >>>>>>>>>>> http://pdfbox.apache.org/it >>>>>>>>>>>>>>>>>>>>>>> contains >>>>>>>>>>>>>>>>>>>>>>>>> a basic overview of the project >>>>>>>>>>>>>>>>>>>>>>>>> and details on how to obtain the source code and build >>>>>>>>>>> PDFBox >>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>> yourself. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Currently we do not perform any OCR and PDFBOX-1912 >>>>>>>>> details >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>>>>>>>>> thoughts so far regarding it. >>>>>>>>>>>>>>>>>>>>>>>>> Note that the OCR libraries mentioned in the JIRA >>>>>>>>>>>>>>>>>>>>>>>>> issue >>>>>>>>> are >>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>> under >>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>> Apache license, which is a >>>>>>>>>>>>>>>>>>>>>>>>> requirement. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Once you have the source code, take a look at the >>>>>>>>> PageDrawer >>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> see >>>>>>>>>>>>>>>>>>>>>>>>> how text and images are >>>>>>>>>>>>>>>>>>>>>>>>> rendered. We want someone to interface at a low-level >>>>>>>>> (e.g. >>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>> glyph, >>>>>>>>>>>>>>>>>>>>>>>>> word, or sentence at a time) with >>>>>>>>>>>>>>>>>>>>>>>>> an OCR engine. Also look at PDFTextStripper which is >>>>>>>>>>>>>>>>>>>>>>>>> how >>>>>>>>>>> text >>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>> currently >>>>>>>>>>>>>>>>>>>>>>>>> extracted, take a look at how >>>>>>>>>>>>>>>>>>>>>>>>> we have to go to great length to sort text back into >>>>>>>>> reading >>>>>>>>>>>>>>>> order >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>> infer the placement of diacritics - PDF >>>>>>>>>>>>>>>>>>>>>>>>> is fundamentally a visual format, not a structured >>>>>>>>>>>>>>>>>>>>>>>>> format >>>>>>>>>>> like >>>>>>>>>>>>>>>> HTML >>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>>>>>> which is why extracting text can be so >>>>>>>>>>>>>>>>>>>>>>>>> difficult sometimes. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> The full PDF Reference document can be found at: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Feel free to discuss specifics of your proposal or ask >>>>>>>>> any >>>>>>>>>>>>>>>>>>> questions. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> -- John >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 23 Feb 2014, at 21:13, DImuthu Upeksha < >>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>> I am Dimuthu Upeksha, a Computer Engineering >>>>>>>>> Undergraduate >>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>>>> University >>>>>>>>>>>>>>>>>>>>>>>>> of Moratuwa Sri Lanka. I successfully completed my >>>>>>>>>>>>>>>>>>>>>>>>> GSoC >>>>>>>>> 2013 >>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>> Apache >>>>>>>>>>>>>>>>>>>>>>>>> ISIS [1] project. I'm very much interested in OCR and >>>>>>>>> image >>>>>>>>>>>>>>>>>>> processing >>>>>>>>>>>>>>>>>>>>>>>>> stuff. So I would like to select this project idea as >>>>>>>>>>>>>>>>>>>>>>>>> my >>>>>>>>>>> GSoC >>>>>>>>>>>>>>>> 2014 >>>>>>>>>>>>>>>>>>>>>>> project >>>>>>>>>>>>>>>>>>>>>>>>> because I feel like it is the best suited project for >>>>>>>>> me. In >>>>>>>>>>>>>>>>>>>>>>> university >>>>>>>>>>>>>>>>>>>>>>>>> also we have done some research in OCR area and our >>>>>>>>>>>>>>>>>>>>>>>>> group >>>>>>>>>>>>>> wrote a >>>>>>>>>>>>>>>>>>>>>>>>> literature review about increasing efficiency of OCR >>>>>>>>>>>>>>>>>>>>>>> systems(attached). Can >>>>>>>>>>>>>>>>>>>>>>>>> you please suggest me where to start learning about >>>>>>>>> PDFBox? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> http://google-opensource.blogspot.com/2013/10/google-summer-of-code-veteran-orgs.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+GoogleOpenSourceBlog+%28Google+Open+Source+Blog%29 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>>>>>>>>>>> Dimuthu >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>>>>>>>>>>>>>>>>>> Undergraduate >>>>>>>>>>>>>>>>>>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>>>>>>>>>>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>>>>>>>>>>>>>>>> Undergraduate >>>>>>>>>>>>>>>>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>>>>>>>>>>>>>> Undergraduate >>>>>>>>>>>>>>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>>>>>>>>>>>>> Undergraduate >>>>>>>>>>>>>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>>>>>>>>>>>> Undergraduate >>>>>>>>>>>>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>>>>>>>>>> Undergraduate >>>>>>>>>>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>>>>>>> Undergraduate >>>>>>>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Regards >>>>>>>>>>>>> >>>>>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>>>>> Undergraduate >>>>>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>>>>> >>>>>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Regards >>>>>>>>>>>> >>>>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>>>> Undergraduate >>>>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>>>> >>>>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>> W.Dimuthu Upeksha >>>>>>>>>> Undergraduate >>>>>>>>>> Department of Computer Science And Engineering >>>>>>>>>> >>>>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Regards >>>>>>>> >>>>>>>> W.Dimuthu Upeksha >>>>>>>> Undergraduate >>>>>>>> Department of Computer Science And Engineering >>>>>>>> >>>>>>>> University of Moratuwa, Sri Lanka >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards >>>>>>> >>>>>>> W.Dimuthu Upeksha >>>>>>> Undergraduate >>>>>>> Department of Computer Science And Engineering >>>>>>> >>>>>>> University of Moratuwa, Sri Lanka >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards >>>>> >>>>> W.Dimuthu Upeksha >>>>> Undergraduate >>>>> >>>>> Department of Computer Science And Engineering >>>>> >>>>> University of Moratuwa, Sri Lanka >>>> >>> >>> >>> >>> -- >>> Regards >>> >>> W.Dimuthu Upeksha >>> Undergraduate >>> >>> Department of Computer Science And Engineering >>> >>> University of Moratuwa, Sri Lanka >> > > > > -- > Regards > > W.Dimuthu Upeksha > Undergraduate > > Department of Computer Science And Engineering > > University of Moratuwa, Sri Lanka
