On Jun 4, 2009, at 2:49 PM, Grant Ingersoll wrote:

Looking more, I think my problem resides around the notion that I'm using EnWikiDocMaker independently of the benchmarking tool. The weird thing is, it used to work, but I don't know when it broke. I suspect I'm not initializing things right.

Anyone else doing that?

Answering my own question, calling resetInputs() first is the key.

For the record, I was seeing the following exception when calling the EWDM standalone: Exception in thread "Thread-0" Exception in thread "main" org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException [INFO] at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker $Parser.next(EnwikiDocMaker.java:167) [INFO] at org .apache .lucene .benchmark .byTask.feeds.EnwikiDocMaker.makeDocument(EnwikiDocMaker.java:300) [INFO] at com.lucidimagination.wikipedia.indexing.Indexer.index(Indexer.java:66) [INFO] at com.lucidimagination.wikipedia.indexing.Indexer.main(Indexer.java:115)
[INFO] java.lang.RuntimeException: java.net.MalformedURLException
[INFO] at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker $Parser.run(EnwikiDocMaker.java:129)
[INFO]  at java.lang.Thread.run(Thread.java:637)
[INFO] Caused by: java.net.MalformedURLException
[INFO]  at java.net.URL.<init>(URL.java:601)
[INFO]  at java.net.URL.<init>(URL.java:464)
[INFO]  at java.net.URL.<init>(URL.java:413)
[INFO] at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) [INFO] at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) [INFO] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) [INFO] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[INFO]  at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
[INFO] at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) [INFO] at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker $Parser.run(EnwikiDocMaker.java:103)
[INFO]  ... 1 more


And here's the code:
      EnwikiDocMaker docMaker = new EnwikiDocMaker();
      Properties properties = new Properties();
      //fileName = config.get("docs.file", null);
      String filePath = wikipediaXML.getAbsolutePath();
      properties.setProperty("docs.file", filePath);
      docMaker.setConfig(new Config(properties));
      docMaker.resetInputs();
      //docMaker.openFile();
      Document doc = null;
List<SolrInputDocument> docs = new ArrayList<SolrInputDocument>(200);
      int i = 0;
      SolrInputDocument sDoc = null;
      long start = System.currentTimeMillis();
      while ((doc = docMaker.makeDocument()) != null && i < numDocs) {
        ...
      }


-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to