On Jun 4, 2009, at 2:49 PM, Grant Ingersoll wrote:
Looking more, I think my problem resides around the notion that I'm
using EnWikiDocMaker independently of the benchmarking tool. The
weird thing is, it used to work, but I don't know when it broke. I
suspect I'm not initializing things right.
Anyone else doing that?
Answering my own question, calling resetInputs() first is the key.
For the record, I was seeing the following exception when calling the
EWDM standalone:
Exception in thread "Thread-0" Exception in thread "main"
org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException
[INFO] at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
$Parser.next(EnwikiDocMaker.java:167)
[INFO] at
org
.apache
.lucene
.benchmark
.byTask.feeds.EnwikiDocMaker.makeDocument(EnwikiDocMaker.java:300)
[INFO] at
com.lucidimagination.wikipedia.indexing.Indexer.index(Indexer.java:66)
[INFO] at
com.lucidimagination.wikipedia.indexing.Indexer.main(Indexer.java:115)
[INFO] java.lang.RuntimeException: java.net.MalformedURLException
[INFO] at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
$Parser.run(EnwikiDocMaker.java:129)
[INFO] at java.lang.Thread.run(Thread.java:637)
[INFO] Caused by: java.net.MalformedURLException
[INFO] at java.net.URL.<init>(URL.java:601)
[INFO] at java.net.URL.<init>(URL.java:464)
[INFO] at java.net.URL.<init>(URL.java:413)
[INFO] at
org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown
Source)
[INFO] at
org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown
Source)
[INFO] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
[INFO] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
[INFO] at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
[INFO] at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
Source)
[INFO] at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
$Parser.run(EnwikiDocMaker.java:103)
[INFO] ... 1 more
And here's the code:
EnwikiDocMaker docMaker = new EnwikiDocMaker();
Properties properties = new Properties();
//fileName = config.get("docs.file", null);
String filePath = wikipediaXML.getAbsolutePath();
properties.setProperty("docs.file", filePath);
docMaker.setConfig(new Config(properties));
docMaker.resetInputs();
//docMaker.openFile();
Document doc = null;
List<SolrInputDocument> docs = new
ArrayList<SolrInputDocument>(200);
int i = 0;
SolrInputDocument sDoc = null;
long start = System.currentTimeMillis();
while ((doc = docMaker.makeDocument()) != null && i < numDocs) {
...
}
-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org