[jira] [Commented] (PDFBOX-1823) Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT - 45" fonts

2014-01-03 Thread Chitrang Natu (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861345#comment-13861345
 ] 

Chitrang Natu commented on PDFBOX-1823:
---

Thanks Andreas for all of your advices and help..

> Apache PDFBox 1.6.0 TextStripper not able to recognise characters having 
> "Frutiger LT - 45" fonts
> -
>
> Key: PDFBOX-1823
> URL: https://issues.apache.org/jira/browse/PDFBOX-1823
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 1.6.0
> Environment: jdk1.6
>Reporter: Chitrang Natu
>Assignee: Andreas Lehmkühler
>  Labels: newbie
> Attachments: PDF_With_Frutiger_font.pdf, 
> TC01_output.concat.MD302AE_Part2.doc, Test_Frutiger.java, 
> fontbox-checkstyle.xml, pdfbox-checkstyle.xml, pom.xml
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When i tried to extract contents from PDF's I am successfully able to extract 
> all text with PDFBox API but getting trouble with fonts having 'Frutiger' 
> style. For these i am getting squared Boxes in place of characters.
> It seems PDFBox FontBox supports only 14 UTF characters set  And none of them 
> is Frutiger style fonts. 
> If anybody please can suggest something. That would be of great help. I am in 
> urgent need of the solution.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PDFBOX-1823) Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT - 45" fonts

2014-01-02 Thread Chitrang Natu (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860268#comment-13860268
 ] 

Chitrang Natu commented on PDFBOX-1823:
---

Hi Andreas and Thomas,

I have uploaded the docs and test program as well..
Please have a look and let me know if anything else is required.

Thanks !!

> Apache PDFBox 1.6.0 TextStripper not able to recognise characters having 
> "Frutiger LT - 45" fonts
> -
>
> Key: PDFBOX-1823
> URL: https://issues.apache.org/jira/browse/PDFBOX-1823
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 1.6.0
> Environment: jdk1.6
>Reporter: Chitrang Natu
>  Labels: newbie
> Attachments: PDF_With_Frutiger_font.pdf, 
> TC01_output.concat.MD302AE_Part2.doc, Test_Frutiger.java, 
> fontbox-checkstyle.xml, pdfbox-checkstyle.xml, pom.xml
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When i tried to extract contents from PDF's I am successfully able to extract 
> all text with PDFBox API but getting trouble with fonts having 'Frutiger' 
> style. For these i am getting squared Boxes in place of characters.
> It seems PDFBox FontBox supports only 14 UTF characters set  And none of them 
> is Frutiger style fonts. 
> If anybody please can suggest something. That would be of great help. I am in 
> urgent need of the solution.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PDFBOX-1823) Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT - 45" fonts

2014-01-02 Thread Chitrang Natu (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860194#comment-13860194
 ] 

Chitrang Natu commented on PDFBOX-1823:
---

Hi Andreas,

As you suggested I tried to save the text using Acrobat reader but there too I 
was unable to extract it (Result : 

 


 
  

   

-
-
 
 !"#$"! %! ). Can you please suggest what does this mean.
And can you please suggest that if PDFBox will not be able to extract it as 
well than how should I proceed. Thanks

> Apache PDFBox 1.6.0 TextStripper not able to recognise characters having 
> "Frutiger LT - 45" fonts
> -
>
> Key: PDFBOX-1823
> URL: https://issues.apache.org/jira/browse/PDFBOX-1823
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 1.6.0
> Environment: jdk1.6
>Reporter: Chitrang Natu
>  Labels: newbie
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When i tried to extract contents from PDF's I am successfully able to extract 
> all text with PDFBox API but getting trouble with fonts having 'Frutiger' 
> style. For these i am getting squared Boxes in place of characters.
> It seems PDFBox FontBox supports only 14 UTF characters set  And none of them 
> is Frutiger style fonts. 
> If anybody please can suggest something. That would be of great help. I am in 
> urgent need of the solution.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PDFBOX-1823) Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT - 45" fonts

2014-01-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860137#comment-13860137
 ] 

Andreas Lehmkühler commented on PDFBOX-1823:


I'm afraid no one can suggest a workaround for a unknown problem 

Did you check if the text can be extracted at all? Try to save the text using 
acrobat reader. If that doesn't work, PDFBox most likely isn't able to extract 
the text too.

> Apache PDFBox 1.6.0 TextStripper not able to recognise characters having 
> "Frutiger LT - 45" fonts
> -
>
> Key: PDFBOX-1823
> URL: https://issues.apache.org/jira/browse/PDFBOX-1823
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 1.6.0
> Environment: jdk1.6
>Reporter: Chitrang Natu
>  Labels: newbie
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When i tried to extract contents from PDF's I am successfully able to extract 
> all text with PDFBox API but getting trouble with fonts having 'Frutiger' 
> style. For these i am getting squared Boxes in place of characters.
> It seems PDFBox FontBox supports only 14 UTF characters set  And none of them 
> is Frutiger style fonts. 
> If anybody please can suggest something. That would be of great help. I am in 
> urgent need of the solution.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PDFBOX-1823) Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT - 45" fonts

2014-01-02 Thread Chitrang Natu (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860087#comment-13860087
 ] 

Chitrang Natu commented on PDFBOX-1823:
---

Yes I tried with latest PDFBox 1.8.3 and still having the issue with 
particularly for Frutiger font rest it is able to extract.

> Apache PDFBox 1.6.0 TextStripper not able to recognise characters having 
> "Frutiger LT - 45" fonts
> -
>
> Key: PDFBOX-1823
> URL: https://issues.apache.org/jira/browse/PDFBOX-1823
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 1.6.0
> Environment: jdk1.6
>Reporter: Chitrang Natu
>  Labels: newbie
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When i tried to extract contents from PDF's I am successfully able to extract 
> all text with PDFBox API but getting trouble with fonts having 'Frutiger' 
> style. For these i am getting squared Boxes in place of characters.
> It seems PDFBox FontBox supports only 14 UTF characters set  And none of them 
> is Frutiger style fonts. 
> If anybody please can suggest something. That would be of great help. I am in 
> urgent need of the solution.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PDFBOX-1823) Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT - 45" fonts

2014-01-01 Thread Thomas Chojecki (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859905#comment-13859905
 ] 

Thomas Chojecki commented on PDFBOX-1823:
-

Did you try the latest pdfbox 1.8.3? Maybe this issue was fixed.

if the issue still remains, please attach a sample document and code, so we can 
reproduce the issue.

> Apache PDFBox 1.6.0 TextStripper not able to recognise characters having 
> "Frutiger LT - 45" fonts
> -
>
> Key: PDFBOX-1823
> URL: https://issues.apache.org/jira/browse/PDFBOX-1823
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 1.6.0
> Environment: jdk1.6
>Reporter: Chitrang Natu
>  Labels: newbie
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When i tried to extract contents from PDF's I am successfully able to extract 
> all text with PDFBox API but getting trouble with fonts having 'Frutiger' 
> style. For these i am getting squared Boxes in place of characters.
> It seems PDFBox FontBox supports only 14 UTF characters set  And none of them 
> is Frutiger style fonts. 
> If anybody please can suggest something. That would be of great help. I am in 
> urgent need of the solution.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)