Intermittent thread safety issue with EnwikiDocMaker
----------------------------------------------------

                 Key: LUCENE-1117
                 URL: https://issues.apache.org/jira/browse/LUCENE-1117
             Project: Lucene - Java
          Issue Type: Bug
          Components: contrib/benchmark
    Affects Versions: 2.2, 2.3
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 2.3
         Attachments: LUCENE-1117.patch

Intermittent thread safety issue with EnwikiDocMaker

When I run the conf/wikipediaOneRound.alg, sometimes it gets started
OK, other times (about 1/3rd the time) I see this:

     Exception in thread "Thread-0" java.lang.RuntimeException: 
java.io.IOException: Bad file descriptor
        at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:76)
        at java.lang.Thread.run(Thread.java:595)
     Caused by: java.io.IOException: Bad file descriptor
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:194)
        at 
org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown 
Source)
        at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
        at 
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:60)
        ... 1 more

The problem is that the thread that pulls the XML docs is started as
soon as EnwikiDocMaker class is instantiated.  When it's started, it
uses the fileIS (FileInputStream) to feed the XML Parser.  But,
openFile is actually called twice on starting the alg, if you use any
task deriving from ResetInputsTask, which closes the original fileIS
that the XML parser may be using.

I changed the thread to instead start on-demand the first time next()
is called.  I also removed a redundant resetInputs() call (which was
opening the file more frequently than needed).  Finally, I added logic
in the thread to detect that the input stream was closed (because
LineDocMaker.resetInputs() was called, eg, if we are not running the
doc maker to exhaustion).



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to