Re: 2.0

John Hewson Thu, 23 Oct 2014 11:39:12 -0700

Do we have a JIRA issue for these, or shall I create one?

-- John


On 14 Oct 2014, at 09:18, Tilman Hausherr <[email protected] 
<mailto:[email protected]>> wrote:

> Here are some:
> 
> 055/055794.pdf
> 082/082463.pdf
> 108/108362.pdf
> 113/113223.pdf
> 115/115458.pdf
> 115/115463.pdf
> 122/122393.pdf
> 129/129416.pdf
> 133/133423.pdf
> 148/148020.pdf
> 152/152012.pdf
> 161/161466.pdf
> 
> to be found here:
> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/ 
> <http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/>
> 
> Tilman
> 
> Am 14.10.2014 um 21:06 schrieb John Hewson:
>> Unless somebody provides us with a list of those files, then I think this is 
>> an unreasonable request. As long as we continue to leave the old parser in 
>> PDFBox, we won’t get the bug reports which we need to fix the new parser, 
>> and the situation will never resolve itself. Falling back to the old parser 
>> is just as bad - we won’t get bug reports.
>> 
>> -- John
>> 
>> On 14 Oct 2014, at 07:39, Tilman Hausherr <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> I prefer that the "old" parser not be removed, because there are many files 
>>> that can only be parsed by the old parser. This came out in a  large scale 
>>> test with TIKA.
>>> 
>>> The best idea (in my current opinion) is to use the nonSeq parser first, 
>>> and the old parser if there is an exception.
>>> 
>>> Tilman
>>> 
>>> Am 14.10.2014 um 09:45 schrieb Timo Boehme:
>>>> Hi,
>>>> 
>>>> Am 14.10.2014 um 07:22 schrieb John Hewson:
>>>>> Hi,
>>>>>>> John Hewson <[email protected] <mailto:[email protected]>> hat am 10. 
>>>>>>> Oktober 2014 um 20:05 geschrieben:
>>>>>>> 
>>>>>>> 
>>>>>>>        - Parsing (Andreas?)
>>>>>> I guess we won't get a complete new parser in 2.0, but I try to improve 
>>>>>> the XRef
>>>>>> and the COSStream stuff
>>>>> It would be great if we could get rid of the old parser and switch to the 
>>>>> non-sequential
>>>>> parser, WDYT?
>>>> I would also propose to completely remove the old parser. That way we are 
>>>> more flexible in parsing streams etc. since parts of the non-sequential 
>>>> parser are a compromise to work side-by-side with the old parser.
>>>> Possibly there are a small number of functions for which the old parser is 
>>>> still needed - e.g. signing?
>>>> 
>>>> 
>>>> Best,
>>>> Timo
>>>> 
>>>> 
>> 
>

Re: 2.0

Reply via email to