[ https://issues.apache.org/jira/browse/LUCENE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1117: --------------------------------------- Attachment: LUCENE-1117.patch Attached patch. All tests pass. I plan to commit in a day or two. > Intermittent thread safety issue with EnwikiDocMaker > ---------------------------------------------------- > > Key: LUCENE-1117 > URL: https://issues.apache.org/jira/browse/LUCENE-1117 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark > Affects Versions: 2.2, 2.3 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-1117.patch > > > Intermittent thread safety issue with EnwikiDocMaker > When I run the conf/wikipediaOneRound.alg, sometimes it gets started > OK, other times (about 1/3rd the time) I see this: > Exception in thread "Thread-0" java.lang.RuntimeException: > java.io.IOException: Bad file descriptor > at > org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:76) > at java.lang.Thread.run(Thread.java:595) > Caused by: java.io.IOException: Bad file descriptor > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:194) > at > org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown > Source) > at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) > at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) > at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) > at > org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown > Source) > at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown > Source) > at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown > Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) > at > org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:60) > ... 1 more > The problem is that the thread that pulls the XML docs is started as > soon as EnwikiDocMaker class is instantiated. When it's started, it > uses the fileIS (FileInputStream) to feed the XML Parser. But, > openFile is actually called twice on starting the alg, if you use any > task deriving from ResetInputsTask, which closes the original fileIS > that the XML parser may be using. > I changed the thread to instead start on-demand the first time next() > is called. I also removed a redundant resetInputs() call (which was > opening the file more frequently than needed). Finally, I added logic > in the thread to detect that the input stream was closed (because > LineDocMaker.resetInputs() was called, eg, if we are not running the > doc maker to exhaustion). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]