[jira] Updated: (LUCENE-2181) benchmark for collation

Robert Muir (JIRA) Sun, 10 Jan 2010 16:21:16 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated LUCENE-2181:
--------------------------------

    Attachment: LUCENE-2181.patch

ok, somehow it completely bypassed my brain you are using ReadTokens task :)

so this is a problem, because ReadTokens does not respect the DocMaker 
configuration. In my opinion it should not tokenize fields unless they are 
configured to be tokenized. 

So I added the following in this patch to fix this:
{noformat}
     for(final Fieldable field : fields) {
+      if (!field.isTokenized()) continue;
+
{noformat}

now we get the results we expect:
||Language||java.text||ICU4J||KeywordAnalyzer||ICU4J Improvement||
|English|3.43s|2.21s|1.15s|115%|
|French|3.78s|2.37s|1.17s|117%|
|German|3.84s|2.42s|1.18s|115%|
|Ukrainian|5.81s|3.67s|1.24s|88%|

if you comment out the doc.tokenized=false, then you get the other results i 
just posted instead, as it will analyze the other fields too.


> benchmark for collation
> -----------------------
>
>                 Key: LUCENE-2181
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2181
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/benchmark
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
> LUCENE-2181.patch, LUCENE-2181.patch, 
> top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2
>
>
> Steven Rowe attached a contrib/benchmark-based benchmark for collation (both 
> jdk and icu) under LUCENE-2084, along with some instructions to run it... 
> I think it would be a nice if we could turn this into a committable patch and 
> add it to benchmark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-2181) benchmark for collation

Reply via email to