[ 
https://issues.apache.org/jira/browse/PDFBOX-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved PDFBOX-30.
---------------------------------

    Resolution: Duplicate

As mentioned above, most of these issues seem to already have been fixed.

> some code  bugs
> ---------------
>
>                 Key: PDFBOX-30
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-30
>             Project: PDFBox
>          Issue Type: Bug
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1084937
> Originally submitted by jinfeng_wang on 2004-12-13 22:50.
> hi, ben , recently i am read your code. after my test, 
> the following bugs are found in the PDFBox 0.6.7  release 
> version.
> 1) code in org.pdfbox.cmaptypes.CMap:
>   the algorithm of  computing "key"  in the method  
> of "lookup" and "addMapping"  are out of consistent  
> when the  lenth is 2. 
>    the code in "lookup" :
>      int intKey = (code[offset]+256)%256;
>      intKey <<= 8;
>      intKey += (code[offset+1]+256)%256;
>      key = new Integer( intKey );
>    the code in "addMapping":
>        int intSrc = src[0];
>        intSrc <<= 8;
>        intSrc |= (src[1]&0xFF);
>        doubleByteMappings.put( new Integer( intSrc ), 
> dest );
>      
> 2) code  in  org.pdfbox.pdmodel.font.PDFont.encode():
>    when the PDF file contains  the "ToUnicode" CMap, it 
> will try to parse the "ToUnicode" CMap every time for 
> each character. this CMap file is not be stored for the 
> later using.
>    the code is:
>   COSStream toUnicode =
>   (COSStream) font.getDictionaryObject(
>  COSName.getPDFName("ToUnicode"));
>    if (toUnicode != null) {
>  parseCmap(toUnicode.getUnfilteredStream(), 
> null);
> 3) code  in  org.pdfbox.pdmodel.font.PDFont.encode():
>   when the current Font is Type0,  it should parse TWO 
> cmaps files according to the PDF Reference. but it is 
> neglated in the release version. 
> 4). code in org.pdfbox.cmapparser.CMapParser.parse().
>   the code has neglated the " if (op.getOperation
> ().equals(BEGIN_CID_RANGE))"  which is very important 
> for "Type0" font.
> 5) code in  org.pdfbox.cmapparser.CMapParser.equal().
>    i have downloaded the CVS code from sourceforge.
>   this function is renamed to "lessThanOrEqual".
>    however , i found that there is some bug withe 
> this "lessThanOrEqual" member function.  when i try to  
> parse the CMap of "UniCNS-UCS2-H" 
> with  "lessThanOrEqual", the return value will be always 
> TRUE, so the "while (!equals(startBytes, endBytes))" will 
> be not termintad at all. 
> 6)  the text of the PDF file in the attachment can not be 
> extract correctly both in the last release version and  
> the CVS development version.
>     would you please to tell me the algorithm for 
> the "blank space" more in detail? 
>     i have noticed that the class of "TextPosition" is 
> changed in the development version comparing to the 
> relese version.  now i have comment out the "for loop"  
> in the "PDFTextStripper.flush()" in the release version, 
> and the running speed is OK when extract "J2EE 
> tutorial". :-)  
>     would you please to tell me more about the algorithm, 
> thanks.
> btw, if you like, i will email to you the code about the 
> extract "Type0" Font. 
>   
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> thanks for the report I will take a look at these.  Many bugs 
> have been fixed since the 0.6.7 release, please try to use the 
> nightly release.
> Ben
> [comment on SourceForge]
> Originally sent by jinfeng_wang.
> Logged In: YES 
> user_id=1145721
> uploaded the error pdf file.
> [comment on SourceForge]
> Originally sent by jinfeng_wang.
> Logged In: YES 
> user_id=1145721
> sorry, i have not upload the "Error" PDf file .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to