[ https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796066#action_12796066 ]
Robert Muir commented on LUCENE-2181: ------------------------------------- bq. I do have one concern, though: the LineDocSource parser doesn't know how to handle comments, so these four files don't have Apache2 license declarations in them. We should put a README (or something like it) with these files to indicate the license. Are they really apache license? or derived from wikipedia content? if these files are only being downloaded when you run 'ant benchmark' for collation, then it is just like the enwiki task in benchmark, it downloads some huge wikipedia data and runs it. So someone please correct me if I am wrong, but I don't think we should be putting apache license headers in these files anyway, its just like the benchmark enwiki task, we are not shipping it with our source distribution. bq. Different subject: I'm not sure where it would go, but the code I used to produce these top-TF wikipedia files may be useful to other people - where do you think it could live? An example, maybe? hmm I will have to think about this... anyone got ideas? I think this would be useful too (I admit to not having yet looked at the implementation), here are two examples: * Karl could use this to evaluate his swedish stemming improvements, taking frequency into account. * the obvious use of when you need to build a stopword list, these top terms are where you want to start. > benchmark for collation > ----------------------- > > Key: LUCENE-2181 > URL: https://issues.apache.org/jira/browse/LUCENE-2181 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/benchmark > Reporter: Robert Muir > Assignee: Robert Muir > Attachments: LUCENE-2181.patch.zip > > > Steven Rowe attached a contrib/benchmark-based benchmark for collation (both > jdk and icu) under LUCENE-2084, along with some instructions to run it... > I think it would be a nice if we could turn this into a committable patch and > add it to benchmark. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org