That’s very good news! -- John
> On 23 Oct 2014, at 11:40, Tilman Hausherr <[email protected]> wrote: > > This is now obsolete, thanks to Andreas having resolved PDFBOX-2250. > > Tilman > > Am 23.10.2014 um 09:33 schrieb John Hewson: >> Do we have a JIRA issue for these, or shall I create one? >> >> -- John >> >> On 14 Oct 2014, at 09:18, Tilman Hausherr <[email protected] >> <mailto:[email protected]>> wrote: >> >>> Here are some: >>> >>> 055/055794.pdf >>> 082/082463.pdf >>> 108/108362.pdf >>> 113/113223.pdf >>> 115/115458.pdf >>> 115/115463.pdf >>> 122/122393.pdf >>> 129/129416.pdf >>> 133/133423.pdf >>> 148/148020.pdf >>> 152/152012.pdf >>> 161/161466.pdf >>> >>> to be found here: >>> http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/ >>> <http://digitalcorpora.org/corp/nps/files/govdocs1/zipfiles/> >>> >>> Tilman >>> >>> Am 14.10.2014 um 21:06 schrieb John Hewson: >>>> Unless somebody provides us with a list of those files, then I think this >>>> is an unreasonable request. As long as we continue to leave the old parser >>>> in PDFBox, we won’t get the bug reports which we need to fix the new >>>> parser, and the situation will never resolve itself. Falling back to the >>>> old parser is just as bad - we won’t get bug reports. >>>> >>>> -- John >>>> >>>> On 14 Oct 2014, at 07:39, Tilman Hausherr <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>>> I prefer that the "old" parser not be removed, because there are many >>>>> files that can only be parsed by the old parser. This came out in a >>>>> large scale test with TIKA. >>>>> >>>>> The best idea (in my current opinion) is to use the nonSeq parser first, >>>>> and the old parser if there is an exception. >>>>> >>>>> Tilman >>>>> >>>>> Am 14.10.2014 um 09:45 schrieb Timo Boehme: >>>>>> Hi, >>>>>> >>>>>> Am 14.10.2014 um 07:22 schrieb John Hewson: >>>>>>> Hi, >>>>>>>>> John Hewson <[email protected] <mailto:[email protected]>> hat am 10. >>>>>>>>> Oktober 2014 um 20:05 geschrieben: >>>>>>>>> >>>>>>>>> >>>>>>>>> - Parsing (Andreas?) >>>>>>>> I guess we won't get a complete new parser in 2.0, but I try to >>>>>>>> improve the XRef >>>>>>>> and the COSStream stuff >>>>>>> It would be great if we could get rid of the old parser and switch to >>>>>>> the non-sequential >>>>>>> parser, WDYT? >>>>>> I would also propose to completely remove the old parser. That way we >>>>>> are more flexible in parsing streams etc. since parts of the >>>>>> non-sequential parser are a compromise to work side-by-side with the old >>>>>> parser. >>>>>> Possibly there are a small number of functions for which the old parser >>>>>> is still needed - e.g. signing? >>>>>> >>>>>> >>>>>> Best, >>>>>> Timo >>>>>> >>>>>> >> >
