[jira] Commented: (LUCENE-2181) benchmark for collation

Robert Muir (JIRA) Sun, 10 Jan 2010 09:26:17 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798510#action_12798510
 ]


Robert Muir commented on LUCENE-2181:
-------------------------------------

Steve ahh i was wondering about the per-field analyzer wrapper (sorry i 
neglected to mention this, i just forgot about it)... there are likely other 
problems too, and the new stuff needs tests.

What about this per-field thing, what if in the data files, title and date were 
simply blank?

Or should we worry, I agree its stupid, does it skew the results though?
One way to look at it is that its also fairly realistic (even though its 
meaningless, you see numbers and dates everywhere).

The downside to doing per-analyzer wrapper is that it introduces some 
complexity, in all honesty this is not really specific to this collation task, 
right? (i.e. the existing analysis/tokenization benchmarks have this same 
problem)


> benchmark for collation
> -----------------------
>
>                 Key: LUCENE-2181
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2181
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/benchmark
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2181.patch, LUCENE-2181.patch, 
> top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2
>
>
> Steven Rowe attached a contrib/benchmark-based benchmark for collation (both 
> jdk and icu) under LUCENE-2084, along with some instructions to run it... 
> I think it would be a nice if we could turn this into a committable patch and 
> add it to benchmark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2181) benchmark for collation

Reply via email to