[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Parkes updated LUCENE-848: --------------------------------- Attachment: LUCENE-848.txt Here's the patch with the README. By the way, there's also a .rsync-filter in the patch. I never described that. If you use rsync, there's an option where it will look for these filter files and not rsync files/directories as spec'd in the file. Since I sometime rsync working copies around to test on different machines, and since I don't want to try to copy around wikipedia (or the other datasets), I "spec" those out. Without the appropriate rsync option, the files are ignored, so I would think this would be a good thing to have ... > Add supported for Wikipedia English as a corpus in the benchmarker stuff > ------------------------------------------------------------------------ > > Key: LUCENE-848 > URL: https://issues.apache.org/jira/browse/LUCENE-848 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/benchmark > Reporter: Steven Parkes > Assigned To: Grant Ingersoll > Priority: Minor > Fix For: 2.2 > > Attachments: LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, > LUCENE-848.txt, LUCENE-848.txt, WikipediaHarvester.java, xerces.jar, > xerces.jar, xml-apis.jar > > > Add support for using Wikipedia for benchmarking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]