Intermittent thread safety issue with EnwikiDocMaker ----------------------------------------------------
Key: LUCENE-1117 URL: https://issues.apache.org/jira/browse/LUCENE-1117 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Affects Versions: 2.2, 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.3 Attachments: LUCENE-1117.patch Intermittent thread safety issue with EnwikiDocMaker When I run the conf/wikipediaOneRound.alg, sometimes it gets started OK, other times (about 1/3rd the time) I see this: Exception in thread "Thread-0" java.lang.RuntimeException: java.io.IOException: Bad file descriptor at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:76) at java.lang.Thread.run(Thread.java:595) Caused by: java.io.IOException: Bad file descriptor at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:194) at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:60) ... 1 more The problem is that the thread that pulls the XML docs is started as soon as EnwikiDocMaker class is instantiated. When it's started, it uses the fileIS (FileInputStream) to feed the XML Parser. But, openFile is actually called twice on starting the alg, if you use any task deriving from ResetInputsTask, which closes the original fileIS that the XML parser may be using. I changed the thread to instead start on-demand the first time next() is called. I also removed a redundant resetInputs() call (which was opening the file more frequently than needed). Finally, I added logic in the thread to detect that the input stream was closed (because LineDocMaker.resetInputs() was called, eg, if we are not running the doc maker to exhaustion). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]