[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

Steven Parkes (JIRA) Thu, 28 Jun 2007 07:06:47 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508833
 ]


Steven Parkes commented on LUCENE-848:
--------------------------------------

Trying to reproduce now.

Something that came up while restarting the fetch/decompress/etc. was the 
number of files this procedure creates. It's a lot: one for each article. I 
used the existing benchmark code for doing this stuff but perhaps it's not a 
good idea on this scale? For one thing, it kinda kills ant since ant wants to 
do a walk of subtrees for some of its tasks. Either we need to exclude the work 
and temp directories from ant's walks and/or we should come up with something 
better than one file per article.

I think Mike mentioned not doing the one file per article. I'll try to look at 
that ...

> Add supported for Wikipedia English as a corpus in the benchmarker stuff
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-848
>                 URL: https://issues.apache.org/jira/browse/LUCENE-848
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/benchmark
>            Reporter: Steven Parkes
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, 
> LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, 
> WikipediaHarvester.java, xerces.jar, xerces.jar, xml-apis.jar
>
>
> Add support for using Wikipedia for benchmarking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

Reply via email to