Do we have a JIRA issue for these, or shall I create one? -- John
On 14 Oct 2014, at 09:18, Tilman Hausherr <thaush...@t-online.de <mailto:thaush...@t-online.de>> wrote: > Here are some: > > 055/055794.pdf > 082/082463.pdf > 108/108362.pdf > 113/113223.pdf > 115/115458.pdf > 115/115463.pdf > 122/122393.pdf > 129/129416.pdf > 133/133423.pdf > 148/148020.pdf > 152/152012.pdf > 161/161466.pdf > > to be found here: > http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/ > <http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/> > > Tilman > > Am 14.10.2014 um 21:06 schrieb John Hewson: >> Unless somebody provides us with a list of those files, then I think this is >> an unreasonable request. As long as we continue to leave the old parser in >> PDFBox, we won’t get the bug reports which we need to fix the new parser, >> and the situation will never resolve itself. Falling back to the old parser >> is just as bad - we won’t get bug reports. >> >> -- John >> >> On 14 Oct 2014, at 07:39, Tilman Hausherr <thaush...@t-online.de >> <mailto:thaush...@t-online.de>> wrote: >> >>> I prefer that the "old" parser not be removed, because there are many files >>> that can only be parsed by the old parser. This came out in a large scale >>> test with TIKA. >>> >>> The best idea (in my current opinion) is to use the nonSeq parser first, >>> and the old parser if there is an exception. >>> >>> Tilman >>> >>> Am 14.10.2014 um 09:45 schrieb Timo Boehme: >>>> Hi, >>>> >>>> Am 14.10.2014 um 07:22 schrieb John Hewson: >>>>> Hi, >>>>>>> John Hewson <j...@jahewson.com <mailto:j...@jahewson.com>> hat am 10. >>>>>>> Oktober 2014 um 20:05 geschrieben: >>>>>>> >>>>>>> >>>>>>> - Parsing (Andreas?) >>>>>> I guess we won't get a complete new parser in 2.0, but I try to improve >>>>>> the XRef >>>>>> and the COSStream stuff >>>>> It would be great if we could get rid of the old parser and switch to the >>>>> non-sequential >>>>> parser, WDYT? >>>> I would also propose to completely remove the old parser. That way we are >>>> more flexible in parsing streams etc. since parts of the non-sequential >>>> parser are a compromise to work side-by-side with the old parser. >>>> Possibly there are a small number of functions for which the old parser is >>>> still needed - e.g. signing? >>>> >>>> >>>> Best, >>>> Timo >>>> >>>> >> >