[
https://issues.apache.org/jira/browse/PDFBOX-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewson updated PDFBOX-540:
-------------------------------
Component/s: (was: Swing GUI)
Utilities
> New functionality for class inherited from PDFTextStripperByArea class
> ----------------------------------------------------------------------
>
> Key: PDFBOX-540
> URL: https://issues.apache.org/jira/browse/PDFBOX-540
> Project: PDFBox
> Issue Type: Improvement
> Components: Utilities
> Affects Versions: 0.7.3
> Environment: Windows Vista
> Reporter: Alexander Shvartz
>
> New functionality for class inherited from PDFTextStripperByArea class
> We were working with org.apache.pdfbox.util.PDFTextStripperByArea class.
> Using Rectangle class and methods of PDFTextStripperByArea class, such as
> getTextForRegion() and others, we have received the text that was identified
> in that region (in our case specific PDF page).
> Our goal was to connect PDFTextStripperByArea class with TextPosition class
> which has methods to manage characters, such as getX(), getY() and many
> others. For this we supposed to use getCharactersByArticle() method. This is
> protected method from PDFTextStripper class, but we need this method in
> PDFTextStripperByArea class.
> For this reason we suggest to create a new class by name (for example)
> PDFTextStripperByAreaChar, inherited from PDFTextStripperByArea class, and
> add to the new class functionality with public getCharactersByArticle()
> method:
> //The class inherited from PDFTextStripperByArea with the additional
> //functionality - the method getCharactersByArticle() taken like example
> //from PDFTextStripper
> package org.apache.pdfbox.util;
> import java.io.IOException;
> import java.util.List;
> public class PDFTextStripperByAreaChar extends PDFTextStripperByArea
> {
> public PDFTextStripperByAreaChar() throws IOException
> {
> super();
>
> }
>
> public List getCharactersByArticle()
> {
> return charactersByArticle;
> }
> }
> The example is:
> PDFTextStripperByAreaChar stripperText = new PDFTextStripperByAreaChar();
> //By idea originally taken from getTitle() method of
> //org.apache.pdfbox.util.PDFText2HTML class we can run the code to get X and
> Y coordinates of the special character:
> Iterator textIter = stripperText.getCharactersByArticle().iterator();
> String charPDF;
> while (textIter.hasNext())
> {
> Iterator textByArticle = ((List) textIter.next()).iterator();
> int j = 1;
> while (textByArticle.hasNext())
> {
> TextPosition text = (TextPosition) textByArticle.next();
>
> charPDF = text.getCharacter();
>
> System.out.println("Char " + j + ": |" + charPDF +
> "| X = " + text.getX() + ", Y = " + text.getY());
>
> j++;
> }
> }
> Thank you.
> DeepDyve developers:
> Alexander Shvartz,
> Raza Mobin,
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)