[ https://issues.apache.org/jira/browse/LUCENE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518524 ]
Michael McCandless commented on LUCENE-971: ------------------------------------------- Patch looks good; a few comments: * In conf/wikipedia.alg: - The comment says "Reuters" but should say "Wikipedia" - It's only processing 1 doc? I think you should change the ": 1" to ": *"? - Maybe rename this to conf/extractEnWikipedia.alg? * When I tried to run this I hit OOM (on Linux). Then I changed the line in conf/wikipedia.alg to this: {WriteLineDoc() > : * And OOM went away and I was able to produce the full line file. That change tells benchmark not to record PerfTask details. So I think we should make that change too. > Create enwiki indexable data as line-per-article rather than file-per-article > ----------------------------------------------------------------------------- > > Key: LUCENE-971 > URL: https://issues.apache.org/jira/browse/LUCENE-971 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Steven Parkes > Assignee: Steven Parkes > Attachments: LUCENE-971.patch.txt, LUCENE-971.patch.txt > > > Create a line per article rather than a file. Consume with indexLineFile task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]