using a french specific analyser without stemming
For a project with a lot ofLucene search (via Compass), I had some troubles with stemming. Stemming is nice for enlarge search range, but make completion strange. So FrenchAnalyzer was not usable. A simpler StandardAnalyzer makes the job right, except for some french speciality, like elision. In french the plane is translated by l'avion and not le avion, and the StandardTokenizer, used by StandardFilter can't tokenize it right. So, I make a specific filter (ElisionFilter), how can I give it to Lucene? With a Jira ticket, with the mailing list? M. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-902) Check on PositionIncrement with StopFilter.
[ https://issues.apache.org/jira/browse/LUCENE-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toru Matsuzawa updated LUCENE-902: -- Attachment: stopfilter20070604.patch patch and test Check on PositionIncrement with StopFilter. Key: LUCENE-902 URL: https://issues.apache.org/jira/browse/LUCENE-902 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.2 Reporter: Toru Matsuzawa Attachments: stopfilter.patch, stopfilter20070604.patch PositionIncrement set with Tokenizer is not considered with StopFilter. When PositionIncrement of Token is 1, it is deleted by StopFilter. However, when PositionIncrement of Token following afterwards is 0, it is not deleted. I think that it is necessary to be deleted. Because it is thought same Token when PositionIncrement is 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2 soon?
Hi, On 6/1/07, Michael Busch [EMAIL PROTECTED] wrote: Considering all these improvements I think it's time for a new release, especially since many of you voted in February to have releases more frequently. Big +1 from me! We're doing a big 1.4 release of Jackrabbit in a few months and many of the improvements you listed would be very much welcome. PS. When doing 2.2, it would be nice if you could upload the release artifacts also in the Maven repository. See the instructions in http://wiki.apache.org/jakarta-lucene/ReleaseTodo. Lucene 2.1 not being in the Maven repository is the main blocker for Jackrabbit not to upgrade right away. BR, Jukka Zitting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-622) Provide More of Lucene For Maven
[ https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501237 ] Grant Ingersoll commented on LUCENE-622: How does Karl's patch compare to Sami's? I haven't looked in-depth at either yet. I will try to at some point. Karl seems to be saying it has poms for each of the contribs, etc. which is probably easier to maintain. This whole bolting on of Maven to ANT just seems a little weird to me. FYI, Hoss, the way to validate the POM is by using Maven! :-) It does it for you. I am beginning to think we should try having a parallel build for a little bit in order for people to test out the different approaches. I think the committers doing releases will see the biggest win from Maven, but I am not 100% sure. If Karl's patch achieves this, then I think we could use it as the basis for doing the evaluation. Provide More of Lucene For Maven Key: LUCENE-622 URL: https://issues.apache.org/jira/browse/LUCENE-622 Project: Lucene - Java Issue Type: Task Affects Versions: 2.0.0 Reporter: Stephen Duncan Jr Assignee: Michael Busch Fix For: 2.2 Attachments: lucene-622.txt, lucene-core.pom, lucene-highlighter-2.0.0.pom, lucene-maven.patch, lucene-maven.tar.bz2 Please provide javadoc source jars for lucene-core. Also, please provide the rest of lucene (the jars inside of contrib in the download bundle) if possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-902) Check on PositionIncrement with StopFilter.
[ https://issues.apache.org/jira/browse/LUCENE-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501245 ] Steven Rowe commented on LUCENE-902: Hi Toru, I looked at your patch (though I didn't test it), and I noticed that it uses generics and varargs, both Java 1.5 features. Lucene core targets Java 1.4, so your patch needs to be rewritten to use only Java 1.4 features. I think I understand what you're going for (filtering out all tokens at the same position as a stopword), and I think it's a useful addition to Lucene, since the naive fix, i.e. employing a StopFilter in a processing pipeline before a morphological analyzer, will negatively impact the morphological analyzer's performance. However, this behavior should not be the default - StopFilter's current behavior is well-defined and depended on by lots of people. I think there are (at least :) ) two possible courses of action here: 1. Include a getter/setter for a boolean field controlling whether to filter out tokens at the same position as stopwords (call it, say, removeStopwordCollocates, where I mean collocate, as a noun, to denote tokens with the same position). This field would be initialized to false, to preserve existing behavior. 2. Change StopFilter to allow extension (remove the final in public final class StopFilter ...), and create a new class extending StopFilter that exhibits the behavior you want. This could start life in the sandbox. I like option #1 better - this functionality, were it available, would quite likely be useful to a significat proportion of Lucene's user base (albeit skewed toward non-Lucene-as-black-box users). Check on PositionIncrement with StopFilter. Key: LUCENE-902 URL: https://issues.apache.org/jira/browse/LUCENE-902 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.2 Reporter: Toru Matsuzawa Attachments: stopfilter.patch, stopfilter20070604.patch PositionIncrement set with Tokenizer is not considered with StopFilter. When PositionIncrement of Token is 1, it is deleted by StopFilter. However, when PositionIncrement of Token following afterwards is 0, it is not deleted. I think that it is necessary to be deleted. Because it is thought same Token when PositionIncrement is 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: using a french specific analyser without stemming
Bonjour Mathieu, Mathieu Lecarme wrote: For a project with a lot ofLucene search (via Compass), I had some troubles with stemming. Stemming is nice for enlarge search range, but make completion strange. So FrenchAnalyzer was not usable. A simpler StandardAnalyzer makes the job right, except for some french speciality, like elision. In french the plane is translated by l'avion and not le avion, and the StandardTokenizer, used by StandardFilter can't tokenize it right. So, I make a specific filter (ElisionFilter), how can I give it to Lucene? With a Jira ticket, with the mailing list? Here's a good place to start: http://wiki.apache.org/jakarta-lucene/HowToContribute FYI, 99% of modifications/additions to Lucene begin life as JIRA issues. -- Steve Rowe Center for Natural Language Processing http://www.cnlp.org/tech/lucene.asp - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
small API change to FieldInfos before 2.2 is released
Hi, I have a small API change that I've added in my patch for LUCENE-843. It just changes two add methods in FieldInfos to return the FieldInfo instance for the added field, instead of void. One of the methods is private, so that should be fine. The other one is public, but, was added after 2.1 and so hasn't been released yet, so we can still change it before releasing 2.2 without breaking backwards compatibility. If there are no objections I will commit this soon. Diffs: Index: src/java/org/apache/lucene/index/FieldInfos.java === --- src/java/org/apache/lucene/index/FieldInfos.java(revision 544145) +++ src/java/org/apache/lucene/index/FieldInfos.java(working copy) @@ -174,12 +174,12 @@ * @param omitNorms true if the norms for the indexed field should be omitted * @param storePayloads true if payloads should be stored for this field */ - public void add(String name, boolean isIndexed, boolean storeTermVector, - boolean storePositionWithTermVector, boolean storeOffsetWithTermVector, - boolean omitNorms, boolean storePayloads) { + public FieldInfo add(String name, boolean isIndexed, boolean storeTermVector, + boolean storePositionWithTermVector, boolean storeOffsetWithTermVector, + boolean omitNorms, boolean storePayloads) { FieldInfo fi = fieldInfo(name); if (fi == null) { - addInternal(name, isIndexed, storeTermVector, storePositionWithTermVector, storeOffsetWithTermVector, omitNorms, storePayloads); + return addInternal(name, isIndexed, storeTermVector, storePositionWithTermVector, storeOffsetWithTermVector, omitNorms, storePayloads); } else { if (fi.isIndexed != isIndexed) { fi.isIndexed = true; // once indexed, always index @@ -201,17 +201,18 @@ } } +return fi; } - - private void addInternal(String name, boolean isIndexed, - boolean storeTermVector, boolean storePositionWithTermVector, - boolean storeOffsetWithTermVector, boolean omitNorms, boolean storePayloads) { + private FieldInfo addInternal(String name, boolean isIndexed, +boolean storeTermVector, boolean storePositionWithTermVector, +boolean storeOffsetWithTermVector, boolean omitNorms, boolean storePayloads) { FieldInfo fi = new FieldInfo(name, isIndexed, byNumber.size(), storeTermVector, storePositionWithTermVector, storeOffsetWithTermVector, omitNorms, storePayloads); byNumber.add(fi); byName.put(name, fi); +return fi; } public int fieldNumber(String fieldName) { Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: small API change to FieldInfos before 2.2 is released
Michael McCandless wrote: Hi, I have a small API change that I've added in my patch for LUCENE-843. It just changes two add methods in FieldInfos to return the FieldInfo instance for the added field, instead of void. One of the methods is private, so that should be fine. The other one is public, but, was added after 2.1 and so hasn't been released yet, so we can still change it before releasing 2.2 without breaking backwards compatibility. If there are no objections I will commit this soon. +1. If we will change the public method anyway in the future than I agree that it makes more sense to do it before 2.2 is out because the method wasn't released so far. Otherwise we'll have an API change in 2.3. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2 soon?
Hi Jukka, Big +1 from me! We're doing a big 1.4 release of Jackrabbit in a few months and many of the improvements you listed would be very much welcome. Cool! PS. When doing 2.2, it would be nice if you could upload the release artifacts also in the Maven repository. See the instructions in http://wiki.apache.org/jakarta-lucene/ReleaseTodo. Lucene 2.1 not being in the Maven repository is the main blocker for Jackrabbit not to upgrade right away. We're already working on getting the upload into the Maven repository done right this time. (See https://issues.apache.org/jira/browse/LUCENE-622) - Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-622) Provide More of Lucene For Maven
[ https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501305 ] Hoss Man commented on LUCENE-622: - I didn't review Karl's attachemnt (the nice thing about patch's is that you can read them easily in a web browser, tgz files need to be downloaded and uncompressed) This whole bolting on of Maven to ANT just seems a little weird to me ... that's kind of funny Grant, it was your suggestion that prompted the approach in Sami's patch... http://www.nabble.com/Maven-artifacts-for-Lucene.*-tf3551707.html#a9941458 Couldn't we just add various ANT targets that package the jars per the Maven way, and even copy them to the appropriate places? I wonder how hard it would be to have ANT output the POM and create Maven Jars. Personally I'm much more in favor of baby steps (add an ant task to prepare the maven artifacts) then completely throwing out the build system and starting over from scratch with maven. ... if we have maven artifacts and people really start using them, then it might make sense to revist the migrate to maven issue ... for now though it seems like everyone is comfortable with ant, and not every one knows/understands maven. Provide More of Lucene For Maven Key: LUCENE-622 URL: https://issues.apache.org/jira/browse/LUCENE-622 Project: Lucene - Java Issue Type: Task Affects Versions: 2.0.0 Reporter: Stephen Duncan Jr Assignee: Michael Busch Fix For: 2.2 Attachments: lucene-622.txt, lucene-core.pom, lucene-highlighter-2.0.0.pom, lucene-maven.patch, lucene-maven.tar.bz2 Please provide javadoc source jars for lucene-core. Also, please provide the rest of lucene (the jars inside of contrib in the download bundle) if possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2 soon?
Hi, On 6/4/07, Michael Busch [EMAIL PROTECTED] wrote: PS. When doing 2.2, it would be nice if you could upload the release artifacts also in the Maven repository. See the instructions in http://wiki.apache.org/jakarta-lucene/ReleaseTodo. Lucene 2.1 not being in the Maven repository is the main blocker for Jackrabbit not to upgrade right away. We're already working on getting the upload into the Maven repository done right this time. (See https://issues.apache.org/jira/browse/LUCENE-622) Nice, thanks a lot to everyone involved! BR, Jukka Zitting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-903) FilteredQuery explanation inaccuracy with boost
[ https://issues.apache.org/jira/browse/LUCENE-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-903: --- Attachment: lucene-903.patch Thanks for the review Hoss! I like with your changes to the patch. I modified further: - do a deep check in TestBoostingTermQuery.java - remove the call to checkExplanations from TestExplanations.qtest() because that is also called in QueryUtils.checkQuery(). (otherwise it would be done twice.) I am ok with original checkExplanations() remaining shallow, since all core tests now call the deep version. I intend to commit this shortly. FilteredQuery explanation inaccuracy with boost --- Key: LUCENE-903 URL: https://issues.apache.org/jira/browse/LUCENE-903 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 2.2 Attachments: lucene-903.patch, lucene-903.patch, lucene-903.patch The value of explanation is different than the product of its part if boost 1. This is exposed after tightening the explanation check (part of LUCENE-446). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-622) Provide More of Lucene For Maven
[ https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501326 ] Sami Siren commented on LUCENE-622: --- thanks Hoss for the feedback in general. I am now a bit confused - is there a consensus how to proceed with this (in other words should I change things as suggested or are we going to take the maven approach to this?) some other questions: a) it doesn't seem like all contribs are accounted for ... were some excluded intentionally? Some were excluded intentionally as their dependencies are not available in public repos (gdata, db), some were just excluded(javascript, lucli, ant) (my worry is that overtime a contrib changes, it's POM doesn't get updated, and we have a release in which the maven artifacts are incorrect (which in my opinion is worse then not having any maven artifacts at all) for 1.9.1 the jar itself was faulty http://jira.codehaus.org/browse/MEV-449 Provide More of Lucene For Maven Key: LUCENE-622 URL: https://issues.apache.org/jira/browse/LUCENE-622 Project: Lucene - Java Issue Type: Task Affects Versions: 2.0.0 Reporter: Stephen Duncan Jr Assignee: Michael Busch Fix For: 2.2 Attachments: lucene-622.txt, lucene-core.pom, lucene-highlighter-2.0.0.pom, lucene-maven.patch, lucene-maven.tar.bz2 Please provide javadoc source jars for lucene-core. Also, please provide the rest of lucene (the jars inside of contrib in the download bundle) if possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-904) Calculate MD5 checksums in target dist-all
[ https://issues.apache.org/jira/browse/LUCENE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501338 ] Michael Busch commented on LUCENE-904: -- doesn't work when people use md5sum -c to try and check the sums. True, I just checked it. md5sum -c emits no properly formatted MD5 checksum lines found for the .md5 files checksum produces without the format attribute. (checksum has a format attribute now, but that wasn't added until ~Dec2006, so I think it requires ant 1.7) Yes it requires 1.7. the way we solved the problem was with the solr-checksum macro added to our build.xml in this commit... Looks good! Mind me stealing the macro? Calculate MD5 checksums in target dist-all Key: LUCENE-904 URL: https://issues.apache.org/jira/browse/LUCENE-904 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Trivial Fix For: 2.2 Attachments: lucene-904.patch Trivial patch that extends the ant target dist-all to calculate the MD5 checksums for the dist files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-622) Provide More of Lucene For Maven
[ https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501334 ] Michael Busch commented on LUCENE-622: -- Personally I'm much more in favor of baby steps (add an ant task to prepare the maven artifacts) then completely throwing out the build system and starting over from scratch with maven. I agree with Hoss here. I *strongly* discourage from switching from ant to maven now before 2.2 is out. I would like to keep the ant build for now and add a target that generates the pom files for the maven upload. After 2.2 is out though we should discuss again whether is makes sense to switch to maven. Then we would have enough time to thoroughly test the new build system before the next release. Provide More of Lucene For Maven Key: LUCENE-622 URL: https://issues.apache.org/jira/browse/LUCENE-622 Project: Lucene - Java Issue Type: Task Affects Versions: 2.0.0 Reporter: Stephen Duncan Jr Assignee: Michael Busch Fix For: 2.2 Attachments: lucene-622.txt, lucene-core.pom, lucene-highlighter-2.0.0.pom, lucene-maven.patch, lucene-maven.tar.bz2 Please provide javadoc source jars for lucene-core. Also, please provide the rest of lucene (the jars inside of contrib in the download bundle) if possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501344 ] Steven Parkes commented on LUCENE-848: -- It looks like the latest successful dump is http://download.wikimedia.org/enwiki/20070527/enwiki-20070527-pages-articles.xml.bz2 If you copy it whereever, I'll fetch it from there and test it. Add supported for Wikipedia English as a corpus in the benchmarker stuff Key: LUCENE-848 URL: https://issues.apache.org/jira/browse/LUCENE-848 Project: Lucene - Java Issue Type: New Feature Components: contrib/benchmark Reporter: Steven Parkes Assignee: Grant Ingersoll Priority: Minor Attachments: LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, WikipediaHarvester.java, xerces.jar, xerces.jar, xml-apis.jar Add support for using Wikipedia for benchmarking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-904) Calculate MD5 checksums in target dist-all
[ https://issues.apache.org/jira/browse/LUCENE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501350 ] Hoss Man commented on LUCENE-904: - You can't steal the macro, but it is Licensed under the ASL v2.0 (and more explicitly: when i wrote it and committed it to the Solr repository I Licensed it to the ASF for inclusion in ASF works as per the Apache Software License ยง5) Calculate MD5 checksums in target dist-all Key: LUCENE-904 URL: https://issues.apache.org/jira/browse/LUCENE-904 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Trivial Fix For: 2.2 Attachments: lucene-904.patch Trivial patch that extends the ant target dist-all to calculate the MD5 checksums for the dist files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-622) Provide More of Lucene For Maven
On Jun 4, 2007, at 2:21 PM, Hoss Man (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-622? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel#action_12501305 ] Hoss Man commented on LUCENE-622: - I didn't review Karl's attachemnt (the nice thing about patch's is that you can read them easily in a web browser, tgz files need to be downloaded and uncompressed) This whole bolting on of Maven to ANT just seems a little weird to me ... that's kind of funny Grant, it was your suggestion that prompted the approach in Sami's patch... http://www.nabble.com/Maven-artifacts-for-Lucene.*- tf3551707.html#a9941458 Couldn't we just add various ANT targets that package the jars per the Maven way, and even copy them to the appropriate places? I wonder how hard it would be to have ANT output the POM and create Maven Jars. Uh, temporary insanity today, I guess. :-) It just struck me funny today. I think when I saw that Karl put in a patch claiming to have most all of it in place, and I figured we might as well jump in. Personally I'm much more in favor of baby steps (add an ant task to prepare the maven artifacts) then completely throwing out the build system and starting over from scratch with maven. ... if we have maven artifacts and people really start using them, then it might make sense to revist the migrate to maven issue ... for now though it seems like everyone is comfortable with ant, and not every one knows/understands maven. Right. I guess I was suggesting parallel support for a little while, but there really would never be any incentive to force the issue. My Maven 2 experience has been ambiguous to date. We are in the process of upgrading from M1. In a lot of ways, things are much better than both ANT and M1, but in a lot of ways, it just isn't there yet and it has been frustrating getting help and finding documentation. For the usual cases, I think M2 works great, but in less common cases, it isn't so great. Lucene probably fits into the usual cases, so we would probably be fine. At any rate, sorry for the confusion. I am fine w/ whichever way works out. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-903) FilteredQuery explanation inaccuracy with boost
[ https://issues.apache.org/jira/browse/LUCENE-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-903. Resolution: Fixed committed the last patch, thanks for your help Hoss. FilteredQuery explanation inaccuracy with boost --- Key: LUCENE-903 URL: https://issues.apache.org/jira/browse/LUCENE-903 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 2.2 Attachments: lucene-903.patch, lucene-903.patch, lucene-903.patch The value of explanation is different than the product of its part if boost 1. This is exposed after tightening the explanation check (part of LUCENE-446). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-904) Calculate MD5 checksums in target dist-all
[ https://issues.apache.org/jira/browse/LUCENE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501380 ] Michael Busch commented on LUCENE-904: -- Well, yes that's what I meant. As long as code is licensed to the ASF for inclusion in ASF works it can be copied from project A to project B, as long as both A and B are licensed under the same ASL, right? Calculate MD5 checksums in target dist-all Key: LUCENE-904 URL: https://issues.apache.org/jira/browse/LUCENE-904 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Trivial Fix For: 2.2 Attachments: lucene-904.patch Trivial patch that extends the ant target dist-all to calculate the MD5 checksums for the dist files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-446) search.function - (1) score based on field value, (2) simple score customizability
[ https://issues.apache.org/jira/browse/LUCENE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501396 ] Doron Cohen commented on LUCENE-446: ok, so I will add in the two ord classes in, so that Solr can move to use this package. search.function - (1) score based on field value, (2) simple score customizability -- Key: LUCENE-446 URL: https://issues.apache.org/jira/browse/LUCENE-446 Project: Lucene - Java Issue Type: New Feature Components: Search Reporter: Yonik Seeley Assignee: Doron Cohen Priority: Minor Fix For: 2.2 Attachments: function.patch.txt, function.patch.txt, function.zip, function.zip FunctionQuery can return a score based on a field's value or on it's ordinal value. FunctionFactory subclasses define the details of the function. There is currently a LinearFloatFunction (a line specified by slope and intercept). Field values are typically obtained from FieldValueSourceFactory. Implementations include FloatFieldSource, IntFieldSource, and OrdFieldSource. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2 soon?
Chris Hostetter wrote on 03/06/2007 20:44:09: : The part I am not sure of in this regard are my changes : to FieldCache and FieldcacheImpl, while LUCENE-831 is : ongoing too. (though btw 831 doesn't apply cleanly on : current trunk). (It is my plan to get into LUCENE-831 : but I haven't got to it yet.) 1) i updated 831 to work on the trunk 2) the spirt of 831 is to change the underlying impl but still be backwards compatible with the FieldCache API ... i skimmed the FieldCache API changes in the latest patch on 446, and they seem to just be adding new get methods (getShort, and getBytes) on the existing underlying impl .. that should be easily supported by the stuff in 831 (and easy to convert/migrate the same way i did the getInt/getLong/getString methods) Indeed, this is the only change. So I will add this now, and 831 would wait I guess for the next release. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2 soon?
: So I will add this now, and 831 would wait I guess for the next release. 831 should definitely not hold up 2.2 ... i wrote it and even i'm not certain that it's the right way to go. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-904) Calculate MD5 checksums in target dist-all
[ https://issues.apache.org/jira/browse/LUCENE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501397 ] Hoss Man commented on LUCENE-904: - (for the record: in my last comment i was attempting to be facetious ... i clearly failed) IANAL but i but as i understand it you are correct, if not then a lot of apache projects are already in a lot of trouble. Calculate MD5 checksums in target dist-all Key: LUCENE-904 URL: https://issues.apache.org/jira/browse/LUCENE-904 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Trivial Fix For: 2.2 Attachments: lucene-904.patch Trivial patch that extends the ant target dist-all to calculate the MD5 checksums for the dist files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-904) Calculate MD5 checksums in target dist-all
[ https://issues.apache.org/jira/browse/LUCENE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501402 ] Michael Busch commented on LUCENE-904: -- Actually my steal sentence was supposed to be facetious too :-) I guess we have to improve our communication (-- this is supposed to be funny, too) Alright, I will submit a new patch shortly. Calculate MD5 checksums in target dist-all Key: LUCENE-904 URL: https://issues.apache.org/jira/browse/LUCENE-904 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Trivial Fix For: 2.2 Attachments: lucene-904.patch Trivial patch that extends the ant target dist-all to calculate the MD5 checksums for the dist files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-904) Calculate MD5 checksums in target dist-all
[ https://issues.apache.org/jira/browse/LUCENE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-904: - Attachment: lucene-904.patch OK, here is the patch with the macro. And indeed, md5sum -c works fine with the .md5 files. I'm planning on committing this patch soon. Calculate MD5 checksums in target dist-all Key: LUCENE-904 URL: https://issues.apache.org/jira/browse/LUCENE-904 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Trivial Fix For: 2.2 Attachments: lucene-904.patch, lucene-904.patch Trivial patch that extends the ant target dist-all to calculate the MD5 checksums for the dist files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-904) Calculate MD5 checksums in target dist-all
[ https://issues.apache.org/jira/browse/LUCENE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-904. -- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Committed. Calculate MD5 checksums in target dist-all Key: LUCENE-904 URL: https://issues.apache.org/jira/browse/LUCENE-904 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Trivial Fix For: 2.2 Attachments: lucene-904.patch, lucene-904.patch Trivial patch that extends the ant target dist-all to calculate the MD5 checksums for the dist files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
For now it is in http://people.apache.org/~gsingers/wikipedia/ enwiki-20070527-pages-articles.xml.bz2 Does ANT get work with redirects? I may eventually move this. I am trying to find the old message and responses from Infrastructure saying where this should go. The original suggestion was zones, but that only has Tomcat on it and I don't want to consume those resources. I can probably just update the patch, so no need to submit a new one unless you want to. -Grant On Jun 4, 2007, at 4:18 PM, Steven Parkes (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-848? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel#action_12501344 ] Steven Parkes commented on LUCENE-848: -- It looks like the latest successful dump is http://download.wikimedia.org/enwiki/20070527/enwiki-20070527-pages- articles.xml.bz2 If you copy it whereever, I'll fetch it from there and test it. Add supported for Wikipedia English as a corpus in the benchmarker stuff - --- Key: LUCENE-848 URL: https://issues.apache.org/jira/browse/LUCENE-848 Project: Lucene - Java Issue Type: New Feature Components: contrib/benchmark Reporter: Steven Parkes Assignee: Grant Ingersoll Priority: Minor Attachments: LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, LUCENE-848.txt, WikipediaHarvester.java, xerces.jar, xerces.jar, xml-apis.jar Add support for using Wikipedia for benchmarking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
: Does ANT get work with redirects? I may eventually move this. I am : trying to find the old message and responses from Infrastructure : saying where this should go. The original suggestion was zones, but : that only has Tomcat on it and I don't want to consume those resources. google finds no record of any posting to infrastructure-issues with mention of wikipedia. you may be thinking of the thread you had on legal-discuss... http://mail-archives.apache.org/mod_mbox/www-legal-discuss/200704.mbox/[EMAIL PROTECTED] the initial advice there was... As long as you do not distribute the Wikipedia database in a Lucene release and just have a copy hosted on your Lucene zone or something similar so that committers can get at it easily, I don't see a ...but there was a followup comment suggesting... Is there some reason you can't distribute it as an overlay package that can be optionally downloaded by developers who intend to do a larger test? ...i'm notsure what that means, but the person making that suggestion didn't seem to understand that using the corpus would be above and beyond the normal build process. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
I thought I asked a follow up on [EMAIL PROTECTED] (which doesn't seem to be archived publicly and is different from infra-issues. Of course, with all the email being sent around, I may just be imaging that I sent it) I inquired there today as to where I can find archives of it, so if you have a pointer to it, let me know. See below On Jun 4, 2007, at 9:26 PM, Chris Hostetter wrote: : Does ANT get work with redirects? I may eventually move this. I am : trying to find the old message and responses from Infrastructure : saying where this should go. The original suggestion was zones, but : that only has Tomcat on it and I don't want to consume those resources. google finds no record of any posting to infrastructure-issues with mention of wikipedia. you may be thinking of the thread you had on legal-discuss... http://mail-archives.apache.org/mod_mbox/www-legal-discuss/ 200704.mbox/[EMAIL PROTECTED] the initial advice there was... As long as you do not distribute the Wikipedia database in a Lucene release and just have a copy hosted on your Lucene zone or something similar so that committers can get at it easily, I don't see a ...but there was a followup comment suggesting... Is there some reason you can't distribute it as an overlay package that can be optionally downloaded by developers who intend to do a larger test? ...i'm notsure what that means, but the person making that suggestion didn't seem to understand that using the corpus would be above and beyond the normal build process. Yeah, I am not sure what is meant either. At any rate, I think we are fine with the link above for now. I know it could go on zones, but it is awfully big and I don't want it to be a hog on that machine. Plus, I would either have to serve it using the Tomcat install on there, or install httpd. Thanks for following up, Chris. -Grant - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Please help testing the release files
Hi Team, in our Lucene 2.1 release we had several problems with our release files: - build.xml in binary release didn't work - demos couldn't be built even though the demo sources were included in the binaries - some contrib modules couldn't be built or testcases failed for some contribs - LICENSE.TXT and NOTICE.TXT weren't included in the META-INF dir Thanks to some recently committed patches and great help from other committers all of the above mentioned problems should be fixed now. I just built the release files from a current svn checkout and uploaded the files to http://people.apache.org/~buschmi/staging_area/lucene/. Note that this is not a release candidate yet. I would like to ask everyone to help testing the build to ensure that we'll find possible additional bugs this time earlier in the release process. So please help testing the release files on different platforms with different JVM versions. Thanks to everyone in advance, - Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-446) search.function - (1) score based on field value, (2) simple score customizability
[ https://issues.apache.org/jira/browse/LUCENE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-446: --- Attachment: function.patch.txt Updated patch: - fixes explanation and toString() issues. - adds the Ord and ReverseOrd valueSource classes that are in use in Solr - warn in the javadocs from the experimental state of this package Javadocs were updated at http://people.apache.org/~doronc/api I will commit this later today of there are no objections. search.function - (1) score based on field value, (2) simple score customizability -- Key: LUCENE-446 URL: https://issues.apache.org/jira/browse/LUCENE-446 Project: Lucene - Java Issue Type: New Feature Components: Search Reporter: Yonik Seeley Assignee: Doron Cohen Priority: Minor Fix For: 2.2 Attachments: function.patch.txt, function.patch.txt, function.patch.txt, function.zip, function.zip FunctionQuery can return a score based on a field's value or on it's ordinal value. FunctionFactory subclasses define the details of the function. There is currently a LinearFloatFunction (a line specified by slope and intercept). Field values are typically obtained from FieldValueSourceFactory. Implementations include FloatFieldSource, IntFieldSource, and OrdFieldSource. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-902) Check on PositionIncrement with StopFilter.
[ https://issues.apache.org/jira/browse/LUCENE-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toru Matsuzawa updated LUCENE-902: -- Attachment: stopfilter20070605.patch Hi Steven, Thank you for pointing out the problem. The corrected patch is attached(stopfilter20070605.patch). I think #1 or #2 is acceptable. How it is solved is entrusted to the committer. I hope it is solved by this problem at the early stage. Check on PositionIncrement with StopFilter. Key: LUCENE-902 URL: https://issues.apache.org/jira/browse/LUCENE-902 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.2 Reporter: Toru Matsuzawa Attachments: stopfilter.patch, stopfilter20070604.patch, stopfilter20070605.patch PositionIncrement set with Tokenizer is not considered with StopFilter. When PositionIncrement of Token is 1, it is deleted by StopFilter. However, when PositionIncrement of Token following afterwards is 0, it is not deleted. I think that it is necessary to be deleted. Because it is thought same Token when PositionIncrement is 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Please help testing the release files
Michael Busch wrote on 04/06/2007 18:59:49: So please help testing the release files on different platforms with different JVM versions. Checked with jdk 1.4 on Win/XP, found no problems: lucene-2.2-dev.zip: + md5: OK + LICENSE.TXT: OK + NOTICE.TXT: OK + ant: OK + ant jar-demo: OK + ant war-demo: OK lucene-2.2-dev-src.zip + md5: OK + LICENSE.TXT: OK + NOTICE.TXT: OK + ant clean test: OK Doron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2 soon?
Michael, I updated LUCENE-446, including these warnings. Is 2.2 still open for adding this? Hi Doron, yes it is. I just sent a note to java-dev with a possible schedule for the 2.2 release in which I suggest to have a feature freeze from Wednesday on. So features can still be committed until end of tomorrow (Tuesday). - Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2 soon?
Michael Busch wrote on 01/06/2007 23:31:11: With 9 votes this is the most popular issue in Jira. I understand your concerns about the API. Maybe we should commit this with comments in the javadocs saying that this feature is in beta state and that the APIs might still be subject to change? Michael, I updated LUCENE-446, including these warnings. Is 2.2 still open for adding this? Thanks, Doron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene 2.2 - Suggested schedule
Hello everyone, I'd like to suggest a schedule here for the Lucene 2.2 release: -- Feature freeze from Wednesday (06/06) All features must be checked in by end of Tuesday. On Wednesday I will branch the trunk and we will have a feature freeze on the branch. Then only Jira issues with Fix version 2.2 and priority Blocker can still be committed to the branch. Exceptions: the Maven patch LUCENE-622 and javadoc patches. Besides LUCENE-622 there is currently only one open issue in Jira with Fix version 2.2: LUCENE-446. Doron is planning to commit this today. So it seems that we are on track for a feature freeze on Wednesday. -- 10 days for javadoc improvements As suggested by Grant we want to use the features freeze to focus on improving our javadocs besides testing. I would like to ask everyone to contribute. Please open all javadoc patches with Fix version 2.2 and type Wish. All javadoc improvements should be checked in by Saturday (6/16). Javadoc issues in Jira that are still open after 06/16 won't block the 2.2 release. -- GA on Tuesday (06/19) On the weekend I will build a release candidate and call a release vote on java-dev. Once we have 3 binding +1 votes from PMC members I will publish the files. Please let me know whether you agree with the details in this plan! - Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-905) left nav of docs/index.html in dist artifacts links to hudson for javadocs
left nav of docs/index.html in dist artifacts links to hudson for javadocs -- Key: LUCENE-905 URL: https://issues.apache.org/jira/browse/LUCENE-905 Project: Lucene - Java Issue Type: Bug Components: Build Reporter: Hoss Man Priority: Minor Fix For: 2.2 When building the zip or tgz release artifacts, the docs/index.html file contained in that release (the starter point for people to read documentation) links API Docs to http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/ instead of to ./api/index.html (the local copy of the javadocs) this relates to the initial migration to hudson for the nightly builds and a plan to copy the javadocs back to lucene.apache.org that wasn't considered urgent since it was just for transient nightly docs, but a side affect is that the release documentation also links to hudson. even if we don't modify the nightly build process before the 2.2 release, we should update the link in the left nav in the 2.2 release branch before building the final release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-905) left nav of docs/index.html in dist artifacts links to hudson for javadocs
[ https://issues.apache.org/jira/browse/LUCENE-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501443 ] Doug Cutting commented on LUCENE-905: - The link should be relative, to ./api/index.html, but ./api should redirect to zones. You can add a redirect to http://svn.apache.org/repos/asf/lucene/.htaccess, then run 'svn up' in /www/lucene.apache.org/. left nav of docs/index.html in dist artifacts links to hudson for javadocs -- Key: LUCENE-905 URL: https://issues.apache.org/jira/browse/LUCENE-905 Project: Lucene - Java Issue Type: Bug Components: Build Reporter: Hoss Man Priority: Minor Fix For: 2.2 When building the zip or tgz release artifacts, the docs/index.html file contained in that release (the starter point for people to read documentation) links API Docs to http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/ instead of to ./api/index.html (the local copy of the javadocs) this relates to the initial migration to hudson for the nightly builds and a plan to copy the javadocs back to lucene.apache.org that wasn't considered urgent since it was just for transient nightly docs, but a side affect is that the release documentation also links to hudson. even if we don't modify the nightly build process before the 2.2 release, we should update the link in the left nav in the 2.2 release branch before building the final release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-765) Index package level javadocs needs content
[ https://issues.apache.org/jira/browse/LUCENE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-765: - Fix Version/s: 2.2 Issue Type: Wish (was: Task) Index package level javadocs needs content -- Key: LUCENE-765 URL: https://issues.apache.org/jira/browse/LUCENE-765 Project: Lucene - Java Issue Type: Wish Components: Javadocs Reporter: Grant Ingersoll Priority: Minor Fix For: 2.2 The org.apache.lucene.index package level javadocs are sorely lacking. They should be updated to give a summary of the important classes, how indexing works, etc. Maybe give an overview of how the different writers coordinate. Links to file formats, information on the posting algorithm, etc. would be helpful. See the search package javadocs as a sample of the kind of info that could go here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]