[ https://issues.apache.org/jira/browse/PDFBOX-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-2246: ------------------------------------ Description: A recent thread in the dev mailing lst (with Aaron H.) dealt with the inability to extract color with PDFTextStripper. The solution was to create a PDFTextStripper with these entries to the properties file {code} CS=org.apache.pdfbox.util.operator.SetStrokingColorSpace cs=org.apache.pdfbox.util.operator.SetNonStrokingColorSpace G=org.apache.pdfbox.util.operator.SetStrokingGrayColor g=org.apache.pdfbox.util.operator.SetNonStrokingGrayColor K=org.apache.pdfbox.util.operator.SetStrokingCMYKColor k=org.apache.pdfbox.util.operator.SetNonStrokingCMYKColor RG=org.apache.pdfbox.util.operator.SetStrokingRGBColor rg=org.apache.pdfbox.util.operator.SetNonStrokingRGBColor SC=org.apache.pdfbox.util.operator.SetStrokingColor sc=org.apache.pdfbox.util.operator.SetNonStrokingColor SCN=org.apache.pdfbox.util.operator.SetStrokingColor scn=org.apache.pdfbox.util.operator.SetNonStrokingColor {code} I therefore propose (and I'd like to get at least one "+1" before starting because I've never worked on that segment before): - replacing the empty entries in the PDFTextStripper property file with the ones above - improve the printtextlocations example The problem has come up before: PDFBOX-1736, http://stackoverflow.com/q/10844271/535646 , http://stackoverflow.com/a/9157714/535646 and the solutions presented are rather cumbersome (using a PageDrawer object). was: A recent thread in the dev mailing lst (with Aaron H.) dealt with the inability to extract color with PDFTextStripper. The solution was to create a PDFTextStripper with these entries to the properties file {code} CS=org.apache.pdfbox.util.operator.SetStrokingColorSpace cs=org.apache.pdfbox.util.operator.SetNonStrokingColorSpace rg=org.apache.pdfbox.util.operator.SetNonStrokingRGBColor G=org.apache.pdfbox.util.operator.SetStrokingGrayColor g=org.apache.pdfbox.util.operator.SetNonStrokingGrayColor K=org.apache.pdfbox.util.operator.SetStrokingCMYKColor k=org.apache.pdfbox.util.operator.SetNonStrokingCMYKColor RG=org.apache.pdfbox.util.operator.SetStrokingRGBColor rg=org.apache.pdfbox.util.operator.SetNonStrokingRGBColor SC=org.apache.pdfbox.util.operator.SetStrokingColor sc=org.apache.pdfbox.util.operator.SetNonStrokingColor SCN=org.apache.pdfbox.util.operator.SetStrokingColor scn=org.apache.pdfbox.util.operator.SetNonStrokingColor {code} I therefore propose (and I'd like to get at least one "+1" before starting because I've never worked on that segment before): - replacing the empty entries in the PDFTextStripper property file with the ones above - improve the printtextlocations example The problem has come up before: PDFBOX-1736, http://stackoverflow.com/q/10844271/535646 , http://stackoverflow.com/a/9157714/535646 and the solutions presented are rather cumbersome (using a PageDrawer object). > PDFTextStripper should handle colors > ------------------------------------ > > Key: PDFBOX-2246 > URL: https://issues.apache.org/jira/browse/PDFBOX-2246 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction > Affects Versions: 1.8.6, 1.8.7, 2.0.0 > Reporter: Tilman Hausherr > Priority: Minor > Fix For: 2.1.0 > > > A recent thread in the dev mailing lst (with Aaron H.) dealt with the > inability to extract color with PDFTextStripper. The solution was to create a > PDFTextStripper with these entries to the properties file > {code} > CS=org.apache.pdfbox.util.operator.SetStrokingColorSpace > cs=org.apache.pdfbox.util.operator.SetNonStrokingColorSpace > G=org.apache.pdfbox.util.operator.SetStrokingGrayColor > g=org.apache.pdfbox.util.operator.SetNonStrokingGrayColor > K=org.apache.pdfbox.util.operator.SetStrokingCMYKColor > k=org.apache.pdfbox.util.operator.SetNonStrokingCMYKColor > RG=org.apache.pdfbox.util.operator.SetStrokingRGBColor > rg=org.apache.pdfbox.util.operator.SetNonStrokingRGBColor > SC=org.apache.pdfbox.util.operator.SetStrokingColor > sc=org.apache.pdfbox.util.operator.SetNonStrokingColor > SCN=org.apache.pdfbox.util.operator.SetStrokingColor > scn=org.apache.pdfbox.util.operator.SetNonStrokingColor > {code} > I therefore propose (and I'd like to get at least one "+1" before starting > because I've never worked on that segment before): > - replacing the empty entries in the PDFTextStripper property file with the > ones above > - improve the printtextlocations example > The problem has come up before: PDFBOX-1736, > http://stackoverflow.com/q/10844271/535646 , > http://stackoverflow.com/a/9157714/535646 and the solutions presented are > rather cumbersome (using a PageDrawer object). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org