Hi Tilman, let me take the opportunity to say thank you for your efforts around code quality and testing. That doesn't result in "hey that's a great new feature" but is a very important part of the development which is very often not directly visible but takes time and dedication.
Sincerly yours Maruan Am 23.01.2015 um 09:00 schrieb Tilman Hausherr <thaush...@t-online.de>: > Hi, > > Besides the "very broken files" (which results in errors in bad parameters > for the PDF operators), there are the out of memory exceptions on huge files. > I think that there are at most 5-10 files left with problems that can be > solved. I'll start a new test when the Isartor improvements are done with a > bigger memory setting, and will also open issues on the exceptions that I > believe can be fixed. > > Tilman > > Am 23.01.2015 um 08:54 schrieb Maruan Sahyoun: >> Hi Tilman, >> >> that's very positive. Not only the number of failures is down by another 45% >> also the time has been reduced a lot. Might be a hint that some of the >> internal changes (parsing, closing …) and improvements in code quality start >> to pay off. >> >> For the 79 files - could you be a little more specific which errors we get? >> Are these still the ones mentioned in you earlier post? >> >> BR >> >> Maruan >> >> Am 23.01.2015 um 08:45 schrieb Tilman Hausherr <thaush...@t-online.de>: >> >>> total: 231223, failed: 79, percentage failed (exceptions other than the >>> "allowed" ValidationExceptions): 0.03416585677769035% >>> >>> This time it took only 2 days instead of 4. Maybe the change with closing >>> made it faster? >>> >>> (This was done about a week ago, I forgot to send the posting) >>> >>> Tilman >>> >>> Am 05.12.2014 um 20:45 schrieb Tilman Hausherr: >>>> Some numbers... it took 4-5 days >>>> >>>> total: 231223, failed: 142, percentage failed: 0.06141257472336292 >>>> >>>> Of these, one can substract 33 OutOfMemoryErrors that happened near the >>>> end of the test. Isolated runs went fine. >>>> >>>> about the rest: >>>> 18 are the isSymbol stackoverflow >>>> 9 are the getFontMatrix NPE >>>> 33 are the "root must be of type Pages" errors >>>> >>>> The rest is mostly related to very broken PDF files. >>>> >>>> Tilman >>>> >>>> >>>> Am 04.12.2014 um 14:55 schrieb Maruan Sahyoun: >>>>> Hi Tilman, >>>>> >>>>> that's very good news. I trust a lot of time went into reviewing the test >>>>> results. wo your and Tim's efforts this achievement wouldn't have been >>>>> possible. >>>>> >>>>> BR >>>>> >>>>> Maruan >>>>> >>>>> Am 03.12.2014 um 21:04 schrieb Tilman Hausherr <thaush...@t-online.de>: >>>>> >>>>>> I've now run preflight on half of the govdocs files. Every issue I have >>>>>> opened on preflight is related to that test. The failure rate >>>>>> (exceptions other than the "allowed" ValidationExceptions) is down from >>>>>> 1% when I started to 0.05% now. Most of the frequent exceptions (e.g. >>>>>> the one with NonTermimalField) have been fixed. Whats left now are >>>>>> exceptions related to messy files, and some of the font related issues. >>>>>> >>>>>> Tilman >>>>>> >>>>>> Am 03.11.2014 um 22:58 schrieb Tilman Hausherr: >>>>>>> Am 03.11.2014 um 19:00 schrieb Tilman Hausherr: >>>>>>>> It is not looking good, there is at least one NPEs issue coming. >>>>>>> No more NPE after solving the two issues I opened today except >>>>>>> PDFBOX-1743.pdf which is a known problem. >>>>>>> >>>>>>> Coming up soon: run preflight on the 231227 PDF files from >>>>>>> digitalcorpora to see what happens. >>>>>>> >>>>>>> Tilman >>>>>>> >> >