[
https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2181:
--------------------------------
Attachment: LUCENE-2181.patch
ok, somehow it completely bypassed my brain you are using ReadTokens task :)
so this is a problem, because ReadTokens does not respect the DocMaker
configuration. In my opinion it should not tokenize fields unless they are
configured to be tokenized.
So I added the following in this patch to fix this:
{noformat}
for(final Fieldable field : fields) {
+ if (!field.isTokenized()) continue;
+
{noformat}
now we get the results we expect:
||Language||java.text||ICU4J||KeywordAnalyzer||ICU4J Improvement||
|English|3.43s|2.21s|1.15s|115%|
|French|3.78s|2.37s|1.17s|117%|
|German|3.84s|2.42s|1.18s|115%|
|Ukrainian|5.81s|3.67s|1.24s|88%|
if you comment out the doc.tokenized=false, then you get the other results i
just posted instead, as it will analyze the other fields too.
> benchmark for collation
> -----------------------
>
> Key: LUCENE-2181
> URL: https://issues.apache.org/jira/browse/LUCENE-2181
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/benchmark
> Reporter: Robert Muir
> Assignee: Robert Muir
> Attachments: LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch,
> LUCENE-2181.patch, LUCENE-2181.patch,
> top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2
>
>
> Steven Rowe attached a contrib/benchmark-based benchmark for collation (both
> jdk and icu) under LUCENE-2084, along with some instructions to run it...
> I think it would be a nice if we could turn this into a committable patch and
> add it to benchmark.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]