On Wed, 16 Dec 2015, Nick Burch wrote:
If you want to try more coding, Tim quite often runs Tika against some large filesets, and has a nifty tool to report on what breaks. He can hopefully point you at the most recent report! Maybe have a look through that, identify a few common failures from unidentified or common exceptions, and try to fix one or two of those?

Another one might be TIKA-1817 - needs two or three new parsers, all hopefully fairly straightforward. There'll want to be a text-based one for ASCII DXF, likely along the lines of some of the scientific text-based formats. There also needs a binary one for binary DXF, maybe also able to do DXB at the same time. The DWG parser might be a good starting point for that, or maybe even could be extended to do those too

Nick

Reply via email to