Re: 2.0

John Hewson Thu, 23 Oct 2014 11:43:22 -0700

That’s very good news!

-- John


> On 23 Oct 2014, at 11:40, Tilman Hausherr <[email protected]> wrote:
> 
> This is now obsolete, thanks to Andreas having resolved PDFBOX-2250.
> 
> Tilman
> 
> Am 23.10.2014 um 09:33 schrieb John Hewson:
>> Do we have a JIRA issue for these, or shall I create one?
>> 
>> -- John
>> 
>> On 14 Oct 2014, at 09:18, Tilman Hausherr <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> Here are some:
>>> 
>>> 055/055794.pdf
>>> 082/082463.pdf
>>> 108/108362.pdf
>>> 113/113223.pdf
>>> 115/115458.pdf
>>> 115/115463.pdf
>>> 122/122393.pdf
>>> 129/129416.pdf
>>> 133/133423.pdf
>>> 148/148020.pdf
>>> 152/152012.pdf
>>> 161/161466.pdf
>>> 
>>> to be found here:
>>> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/ 
>>> <http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/>
>>> 
>>> Tilman
>>> 
>>> Am 14.10.2014 um 21:06 schrieb John Hewson:
>>>> Unless somebody provides us with a list of those files, then I think this 
>>>> is an unreasonable request. As long as we continue to leave the old parser 
>>>> in PDFBox, we won’t get the bug reports which we need to fix the new 
>>>> parser, and the situation will never resolve itself. Falling back to the 
>>>> old parser is just as bad - we won’t get bug reports.
>>>> 
>>>> -- John
>>>> 
>>>> On 14 Oct 2014, at 07:39, Tilman Hausherr <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>>> I prefer that the "old" parser not be removed, because there are many 
>>>>> files that can only be parsed by the old parser. This came out in a  
>>>>> large scale test with TIKA.
>>>>> 
>>>>> The best idea (in my current opinion) is to use the nonSeq parser first, 
>>>>> and the old parser if there is an exception.
>>>>> 
>>>>> Tilman
>>>>> 
>>>>> Am 14.10.2014 um 09:45 schrieb Timo Boehme:
>>>>>> Hi,
>>>>>> 
>>>>>> Am 14.10.2014 um 07:22 schrieb John Hewson:
>>>>>>> Hi,
>>>>>>>>> John Hewson <[email protected] <mailto:[email protected]>> hat am 10. 
>>>>>>>>> Oktober 2014 um 20:05 geschrieben:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>        - Parsing (Andreas?)
>>>>>>>> I guess we won't get a complete new parser in 2.0, but I try to 
>>>>>>>> improve the XRef
>>>>>>>> and the COSStream stuff
>>>>>>> It would be great if we could get rid of the old parser and switch to 
>>>>>>> the non-sequential
>>>>>>> parser, WDYT?
>>>>>> I would also propose to completely remove the old parser. That way we 
>>>>>> are more flexible in parsing streams etc. since parts of the 
>>>>>> non-sequential parser are a compromise to work side-by-side with the old 
>>>>>> parser.
>>>>>> Possibly there are a small number of functions for which the old parser 
>>>>>> is still needed - e.g. signing?
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> Timo
>>>>>> 
>>>>>> 
>> 
>

Re: 2.0

Reply via email to