[ 
https://issues.apache.org/jira/browse/LUCENE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556591#action_12556591
 ] 

Grant Ingersoll commented on LUCENE-1117:
-----------------------------------------

I am now getting:
Exception in thread "Thread-1" java.lang.RuntimeException: 
java.net.MalformedURLException
        at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:88)
        at java.lang.Thread.run(Thread.java:613)
Caused by: java.net.MalformedURLException
        at java.net.URL.<init>(URL.java:601)
        at java.net.URL.<init>(URL.java:464)
        at java.net.URL.<init>(URL.java:413)
        at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
Source)
        at 
org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:62)
        ... 1 more

Is there something new that I need to call before calling nextDocument?  I am 
using this outside of the benchmark framework.  It seems fileIS is not getting 
called for me.

> Intermittent thread safety issue with EnwikiDocMaker
> ----------------------------------------------------
>
>                 Key: LUCENE-1117
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1117
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/benchmark
>    Affects Versions: 2.2, 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1117.patch
>
>
> Intermittent thread safety issue with EnwikiDocMaker
> When I run the conf/wikipediaOneRound.alg, sometimes it gets started
> OK, other times (about 1/3rd the time) I see this:
>      Exception in thread "Thread-0" java.lang.RuntimeException: 
> java.io.IOException: Bad file descriptor
>       at 
> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:76)
>       at java.lang.Thread.run(Thread.java:595)
>      Caused by: java.io.IOException: Bad file descriptor
>       at java.io.FileInputStream.readBytes(Native Method)
>       at java.io.FileInputStream.read(FileInputStream.java:194)
>       at 
> org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown 
> Source)
>       at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
>       at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
>       at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
>       at 
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown 
> Source)
>       at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>  Source)
>       at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>       at 
> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:60)
>       ... 1 more
> The problem is that the thread that pulls the XML docs is started as
> soon as EnwikiDocMaker class is instantiated.  When it's started, it
> uses the fileIS (FileInputStream) to feed the XML Parser.  But,
> openFile is actually called twice on starting the alg, if you use any
> task deriving from ResetInputsTask, which closes the original fileIS
> that the XML parser may be using.
> I changed the thread to instead start on-demand the first time next()
> is called.  I also removed a redundant resetInputs() call (which was
> opening the file more frequently than needed).  Finally, I added logic
> in the thread to detect that the input stream was closed (because
> LineDocMaker.resetInputs() was called, eg, if we are not running the
> doc maker to exhaustion).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to