Ok, thanks... I see I shouldn't have bothered at all, I got the wrong column and thought that there was LESS in 2.0.8 but obviously it isn't, which for some reason I didn't "catch" despite your post and looking at the json. Maybe it's getting late :-(
Tilman

Am 10.10.2017 um 22:13 schrieb Allison, Timothy B.:
However, PDFBox 2.0.8-SNAPSHOT has a more 0, 1, 2 and 3s...

The TOP_10_MORE_IN_B column in the contents report shows that there are 15 more 
0's, 15 more 1's 11 more '2's etc.

0: 15 | 1: 15 | 2: 11 | 20: 5 | 3: 2 | 4: 2
Yeah but where do they come from? Not from the pure text extraction. In the 
json files, I see that there are
many "0:", "1:" in the new file. I wonder if this is about acroform fiels? Can 
be seen e.g. near for
b12c96nfdate36.
Sorry, right, AcroForm.  We're now getting some children we weren't before.

2.0.8-SNAPSHOT:
        <li altName="date362">@@b12c96nfdate362: </li>
<ol>      <li altName="date362">0:   </li>
        <li altName="date362">1:   </li>
        <li altName="date362">2: 20  </li>
</ol>
        <li altName="date362">b12c96nfdate362:     20</li>
2.0.7:
        <li altName="date362">@@b12c96nfdate362: </li>
        <li altName="date362">b12c96nfdate362:     20</li>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to