[ 
https://issues.apache.org/jira/browse/PDFBOX-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3127:
------------------------------------
    Attachment: RAU4G6QMOVRYBISJU7R6MOVZCRFUO7P4-marked-1.png
                RAU4G6QMOVRYBISJU7R6MOVZCRFUO7P4.pdf

> Text with vertical font not extracted correctly
> -----------------------------------------------
>
>                 Key: PDFBOX-3127
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3127
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.10, 1.8.11, 2.0.0
>            Reporter: Tilman Hausherr
>         Attachments: RAU4G6QMOVRYBISJU7R6MOVZCRFUO7P4-marked-1.png, 
> RAU4G6QMOVRYBISJU7R6MOVZCRFUO7P4.pdf
>
>
> The attached file has a vertical font, although the text is horizontal.
> Extraction with 1.8:
> {quote}
> NOTI CE OF PUBLI C HEARI NG 
>  The Sout h Caroli na Depart ment of I nsurance will hol d a publi c 
> heari ng i n accordance wit h t he require ments of Secti on 38-3-
> 110? 5?  Thursday, April 29, 2010 at The Conf erence and Busi ness 
> Cent er at t he Grand Strand Ca mpus of t he Horry- Georgetown 
> Techni cal Coll ege, 950 Crabtree Lane, Myrtl e Beach, S. C., 29577 
> fro m 5: 30 p. m.-7: 00 p. m.    The purpose of t hi s heari ng i s t o provi 
> de 
> an opportunity t o di scuss and off er i nput concerni ng t he st atus of t 
> he 
> coastal property i nsurance market.  The Conf erence Cent er i s l ocat ed 
> one mil e sout h of t he Myrtl e Beach I nt ernati onal Airport bet ween 
> Hi ghway 17 Busi ness and Hi ghway 17 Bypass.  The t el ephone 
> nu mber f or t he Conf erence and Busi ness Cent er i s 843-477-2042. 
> {quote}
> Extraction with 2.0:
> {quote}
> N O T I C E  O F  P U B L I C  H E A R I N G  
>  
> T h e  S o u t h  C a r o l i n a  D e p a r t m e n t  o f  I n s u r a n c 
> e  w i l l  h o l d  a  p u b l i c  
> h e a r i n g  i n  a c c o r d a n c e  w i t h  t h e  r e q u i r e m e n 
> t s  o f  S e c t i o n  3 8 - 3 -
> 1 1 0 ︵5 ︶ T h u r s d a y ,  A p r i l  2 9 ,  2 0 1 0  a t  T h e  C o n f 
> e r e n c e  a n d  B u s i n e s s  
> C e n t e r  a t  t h e  G r a n d  S t r a n d  C a m p u s  o f  t h e  H o 
> r r y - G e o r g e t o w n  
> T e c h n i c a l  C o l l e g e ,  9 5 0  C r a b t r e e  L a n e ,  M y r 
> t l e  B e a c h ,  S . C . ,  2 9 5 7 7  
> f r o m  5 : 3 0  p . m . - 7 : 0 0  p . m .     T h e  p u r p o s e  o f  t 
> h i s  h e a r i n g  i s  t o  p r o v i d e  
> a n  o p p o r t u n i t y  t o  d i s c u s s  a n d  o f f e r  i n p u t  
> c o n c e r n i n g  t h e  s t a t u s  o f  t h e  
> c o a s t a l  p r o p e r t y  i n s u r a n c e  m a r k e t .   T h e  C o 
> n f e r e n c e  C e n t e r  i s  l o c a t e d  
> o n e  m i l e  s o u t h  o f  t h e  M y r t l e  B e a c h  I n t e r n a 
> t i o n a l  A i r p o r t  b e t w e e n  
> H i g h w a y  1 7  B u s i n e s s  a n d  H i g h w a y  1 7  B y p a s s . 
>   T h e  t e l e p h o n e  
> n u m b e r  f o r  t h e  C o n f e r e n c e  a n d  B u s i n e s s  C e n 
> t e r  i s  8 4 3 - 4 7 7 - 2 0 4 2 .  
> {quote}
> A brute force change that uses the correct width, and that works only with 
> this file brings this:
> {quote}
> NOTICE OF PUBLIC HEARING 
>  
> The South Carolina Department of Insurance will hold a public 
> hearing in accordance with the requirements of Section 38-3-
> 110 ︵5 ︶ Thursday, April 29, 2010 at The Conference and Business 
> Center at the Grand Strand Campus of the Horry-Georgetown 
> Technical College, 950 Crabtree Lane, Myrtle Beach, S.C., 29577 
> from 5:30 p.m.-7:00 p.m.    The purpose of this hearing is to provide 
> an opportunity to discuss and offer input concerning the status of the 
> coastal property insurance market.  The Conference Center is located 
> one mile south of the Myrtle Beach International Airport between 
> Highway 17 Business and Highway 17 Bypass.  The telephone 
> number for the Conference and Business Center is 843-477-2042. 
> {quote}
> The problem is that the PDFTextStreamEngine doesn't work well with vertical 
> fonts. The red lines in the attached image show that the size is only half of 
> whats needed. It may be related to PDCIDFont.getDefaultPositionVector() but 
> changing that isn't enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to