[ https://issues.apache.org/jira/browse/LUCENE-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229275#comment-14229275 ]
Ramkumar Aiyengar commented on LUCENE-6073: ------------------------------------------- bq. I'm confused about what looks like leniency in extract(). Does ExtractWikipedia do this too? Is there a good reason to ignore exceptions? I didn't take a look at ExtractWikipedia, actually it might be affected by the same issue actually (of directory deletion) -- I will check. The only "good reason" was because the particular download I had happened to have bad data on one line, and it seemed reasonable to continue with other files in such a case as this was only benchmark data, at worst we would have had a few less docs.. bq. extractFile should just use java.io.LineNumberReader Will check.. bq. is there any way to test this thing? there is a 20-line testfile in o.a.l.benchmark.byTask I just checked this by {{ant get-files}} in the benchmark module (called by {{ant run-task}} eventually), this was failed before in trying to extract files on a clean checkout, with this change it no longer does. But did you mean through Jenkins as a proper test suite? Probably it could use one.. > Fix directory deletion in ExtractReuters, recover from errors > ------------------------------------------------------------- > > Key: LUCENE-6073 > URL: https://issues.apache.org/jira/browse/LUCENE-6073 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark > Reporter: Ramkumar Aiyengar > Priority: Minor > > ExtractReuters in the benchmark module currently fails because it currently > creates the output directory, and then calls {{IOUtils.rm}} on it (which will > remove all files in it as well as removes the output directory itself). This > is to fix this behaviour. > While I was at it, I also added a bit more logging in case of file errors > (the download I had some bad data) and made the task recover in case of > issues with one file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org