[jira] Updated: (LUCENE-1872) Improve javadocs for Numeric*
[ https://issues.apache.org/jira/browse/LUCENE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1872: -- Attachment: LUCENE-1872-uwe.patch Hi Mike, I made some small improvements in formatting and also added a relation between precisionStep and "brackets" which one would not understand (what is the relation between terms bracket and precisionStep). Also the term bracket does not appear anywhere else. So I added, that the larger brackets are simply lower-precision representations of the original value. I also added a link to NumericUtils which get lost, that describes the format (in the advanced section of NumericField). I committed this, revision: 809284 > Improve javadocs for Numeric* > - > > Key: LUCENE-1872 > URL: https://issues.apache.org/jira/browse/LUCENE-1872 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1872-uwe.patch, LUCENE-1872.patch, > LUCENE-1872.patch, LUCENE-1872.patch > > > I'm working on improving Numeric* javadocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages
[ https://issues.apache.org/jira/browse/LUCENE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749238#action_12749238 ] Uwe Schindler commented on LUCENE-1862: --- How about putting the package.html files one level lower into the smartcn package? the package.html of the top-level analyzers's doc could then be only in the common contrib. > duplicate package.html files in queryParser and analsysis.cn packages > - > > Key: LUCENE-1862 > URL: https://issues.apache.org/jira/browse/LUCENE-1862 > Project: Lucene - Java > Issue Type: Bug >Reporter: Hoss Man >Priority: Minor > Fix For: 2.9 > > > These files conflict with eachother when building the javadocs. there can be > only one (of each) ... > {code} > hoss...@brunner:~/lucene/java$ find src contrib -name package.html | perl > -ple 's{.*src/java/}{}' | sort | uniq -c | grep -v " 1 " >2 org/apache/lucene/analysis/cn/package.html >2 org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path > \*queryParser/package.html > src/java/org/apache/lucene/queryParser/package.html > contrib/queryparser/src/java/org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path \*cn/package.html > contrib/analyzers/common/src/java/org/apache/lucene/analysis/cn/package.html > contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/package.html > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1872) Improve javadocs for Numeric*
[ https://issues.apache.org/jira/browse/LUCENE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749239#action_12749239 ] Michael McCandless commented on LUCENE-1872: The new changes look good -- thanks Uwe! > Improve javadocs for Numeric* > - > > Key: LUCENE-1872 > URL: https://issues.apache.org/jira/browse/LUCENE-1872 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1872-uwe.patch, LUCENE-1872.patch, > LUCENE-1872.patch, LUCENE-1872.patch > > > I'm working on improving Numeric* javadocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1872) Improve javadocs for Numeric*
[ https://issues.apache.org/jira/browse/LUCENE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749240#action_12749240 ] Uwe Schindler commented on LUCENE-1872: --- Oh I thought you were still sleeping... Good morning! > Improve javadocs for Numeric* > - > > Key: LUCENE-1872 > URL: https://issues.apache.org/jira/browse/LUCENE-1872 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1872-uwe.patch, LUCENE-1872.patch, > LUCENE-1872.patch, LUCENE-1872.patch > > > I'm working on improving Numeric* javadocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
Javadoc of TokenStream.end() somehow confusing -- Key: LUCENE-1875 URL: https://issues.apache.org/jira/browse/LUCENE-1875 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Uwe Schindler Fix For: 2.9 The Javadocs of TokenStream.end() are somehow confusing, because they also refer to the old TokenStream API ("after next() returned null"). But one who implements his TokenStream with the old API cannot make use of the end() feature, as he would not use attributes and so cannot update the end offsets (he could, but then he should rewrite the whole TokenStream). To be conform to the old API, there must be an end(Token) method, which we will not add. I would drop the old API from this docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
[ https://issues.apache.org/jira/browse/LUCENE-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1875: -- Attachment: LUCENE-1875.patch Patxh with changed end() javadocs. This patch also removes the {...@link TokenStream}s inside TokenStream.java (it does not make sense to link to the same doc page itsself). > Javadoc of TokenStream.end() somehow confusing > -- > > Key: LUCENE-1875 > URL: https://issues.apache.org/jira/browse/LUCENE-1875 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1875.patch > > > The Javadocs of TokenStream.end() are somehow confusing, because they also > refer to the old TokenStream API ("after next() returned null"). But one who > implements his TokenStream with the old API cannot make use of the end() > feature, as he would not use attributes and so cannot update the end offsets > (he could, but then he should rewrite the whole TokenStream). To be conform > to the old API, there must be an end(Token) method, which we will not add. > I would drop the old API from this docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
[ https://issues.apache.org/jira/browse/LUCENE-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-1875: - Assignee: Uwe Schindler > Javadoc of TokenStream.end() somehow confusing > -- > > Key: LUCENE-1875 > URL: https://issues.apache.org/jira/browse/LUCENE-1875 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1875.patch > > > The Javadocs of TokenStream.end() are somehow confusing, because they also > refer to the old TokenStream API ("after next() returned null"). But one who > implements his TokenStream with the old API cannot make use of the end() > feature, as he would not use attributes and so cannot update the end offsets > (he could, but then he should rewrite the whole TokenStream). To be conform > to the old API, there must be an end(Token) method, which we will not add. > I would drop the old API from this docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1872) Improve javadocs for Numeric*
[ https://issues.apache.org/jira/browse/LUCENE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749246#action_12749246 ] Michael McCandless commented on LUCENE-1872: Good morning! > Improve javadocs for Numeric* > - > > Key: LUCENE-1872 > URL: https://issues.apache.org/jira/browse/LUCENE-1872 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1872-uwe.patch, LUCENE-1872.patch, > LUCENE-1872.patch, LUCENE-1872.patch > > > I'm working on improving Numeric* javadocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Parallel incremental indexing
Michael Busch wrote: Hi all, I just added a wiki page for a new feature I'd like to add to Lucene. Please take a look at the link. I will add more details and diagrams to the page, but for now it should give a rough idea about how to implement it: http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing I'm curious what is the relationship of this proposal to the design described in a CKIM '08 paper "Supporting Sub-Document Updates and Queries in an Inverted Index" (nota bene coming from IBM folks)? -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Parallel incremental indexing
Cool stuff! We should also think about how to do single document field updates or field adds since that is the most common usecase - not that it needs to be implemented in the first version, but kept in mind so we don't box ourselves in. Doug mentioned some ideas he had in passing almost a year ago about how to add a field to a single document, and it is similar in that it used parallel reader. IndexWriter would be modified to maintain the same structure across parallel indexes, as you note. If one wanted to add a new field value to document 1000, one would have to index dummy documents for docs 0-999... instead of this, the index format should support gaps. On a segment merge, the IndexWriter could simply merge in this new segment. Anyway, updateable documents is fundamental enough, we should also consider changes to the index format if it makes it easer. -Yonik http://www.lucidimagination.com On Sun, Aug 30, 2009 at 2:23 AM, Michael Busch wrote: > Hi all, > > I just added a wiki page for a new feature I'd like to add to > Lucene. Please take a look at the link. I will add more details and > diagrams to the page, but for now it should give a rough idea about > how to implement it: > > http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing > > Basically the idea is to allow updating documents partially, e.g. only > a subset of the fields without having to reindex the entire > document. This is a feature that is very often asked for. > > We have implemented the solution in IBM and it's working > great. It is a technology that allowed us already to add really exciting > new features to products that weren't easily possible before. > > The implementation I can currently contribute has some limitations: > e.g. multi-threaded indexing is not supported. But let me make clear > that this is not a limitation of the design described in the wiki - we > have these limitations because we implemented this on top of Lucene's 2.4 > APIs. If we decide to add this to Lucene's core we should > reimplement some parts to overcome those limitations. > > In my opinion this will be a great addition to Lucene that many > people will find very useful. In Solr this is also something users often > ask for. > > In the last weeks I worked on getting internal approval for the contribution > to Lucene and the good news is that I already have a signed > software grant ready - so if the community likes this feature and > decides to add this to Lucene there won't be any delay for legal work > from IBM's side. > > Btw: I will be on vacation from 09/03-09/20 and won't have internet > access most of the time, so if I stop responding end of next week you'll > know why... > > Please let me know what you think! > > Michael > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
That depends - the links may end up in summaries on different pages (first sentence as an exaple) - it also provides a consistent formatting for class names so that they pop silmialry everywhere. I don't agree with "it makes no sense." I'd make every classname everywhere a link if I could. - Mark http://www.lucidimagination.com (mobile) On Aug 30, 2009, at 6:17 AM, "Uwe Schindler (JIRA)" wrote: [ https://issues.apache.org/jira/browse/LUCENE-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1875: -- Attachment: LUCENE-1875.patch Patxh with changed end() javadocs. This patch also removes the {...@link TokenStream}s inside TokenStream.java (it does not make sense to link to the same doc page itsself). Javadoc of TokenStream.end() somehow confusing -- Key: LUCENE-1875 URL: https://issues.apache.org/jira/browse/LUCENE-1875 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9 Attachments: LUCENE-1875.patch The Javadocs of TokenStream.end() are somehow confusing, because they also refer to the old TokenStream API ("after next() returned null"). But one who implements his TokenStream with the old API cannot make use of the end() feature, as he would not use attributes and so cannot update the end offsets (he could, but then he should rewrite the whole TokenStream). To be conform to the old API, there must be an end(Token) method, which we will not add. I would drop the old API from this docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1855: -- Attachment: LUCENE-1855.patch Here is the patch that implements generics for the Attributes API. The test shows, how simple now usage of addAttribute() and so on is (no casts needed anymore). The code compiles without any unchecked warnings. To test this, I enabled the warnings about unchecked operations globally (emitting about 350 warnings on whole lucene). Two places inside private code needs to add @SuppressWarnings, because the compiler does not know if one AttributeImpl really implements the questioned interface. > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
I cite the guide from Sun (http://java.sun.com/j2se/javadoc/writingdoccomments/): -- Use in-line links economically You are encouraged to add links for API names (listed immediately above) using the {...@link} tag. It is not necessary to add links for all API names in a doc comment. Because links call attention to themselves (by their color and underline in HTML, and by their length in source code doc comments), it can make the comments more difficult to read if used profusely. We therefore recommend adding a link to an API name if: - The user might actually want to click on it for more information (in your judgment), and - Only for the first occurrence of each API name in the doc comment (don't bother repeating a link) Our audience is advanced (not novice) programmers, so it is generally not necessary to link to API in the java.lang package (such as String), or other API you feel would be well-known. -- The general formatting of class names could be solved by using {...@link ...} for foreign ones and {...@code ...} for the class name itself. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Sunday, August 30, 2009 5:03 PM > To: java-dev@lucene.apache.org > Subject: Re: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() > somehow confusing > > That depends - the links may end up in summaries on different pages > (first sentence as an exaple) - it also provides a consistent > formatting for class names so that they pop silmialry everywhere. I > don't agree with "it makes no sense." I'd make every classname > everywhere a link if I could. > > - Mark > > http://www.lucidimagination.com (mobile) > > On Aug 30, 2009, at 6:17 AM, "Uwe Schindler (JIRA)" > wrote: > > > > > [ https://issues.apache.org/jira/browse/LUCENE- > 1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > > ] > > > > Uwe Schindler updated LUCENE-1875: > > -- > > > >Attachment: LUCENE-1875.patch > > > > Patxh with changed end() javadocs. This patch also removes the > > {...@link TokenStream}s inside TokenStream.java (it does not make sense > > to link to the same doc page itsself). > > > >> Javadoc of TokenStream.end() somehow confusing > >> -- > >> > >>Key: LUCENE-1875 > >>URL: https://issues.apache.org/jira/browse/LUCENE-1875 > >>Project: Lucene - Java > >> Issue Type: Bug > >> Components: Analysis > >> Affects Versions: 2.9 > >> Reporter: Uwe Schindler > >> Assignee: Uwe Schindler > >>Fix For: 2.9 > >> > >>Attachments: LUCENE-1875.patch > >> > >> > >> The Javadocs of TokenStream.end() are somehow confusing, because > >> they also refer to the old TokenStream API ("after next() returned > >> null"). But one who implements his TokenStream with the old API > >> cannot make use of the end() feature, as he would not use > >> attributes and so cannot update the end offsets (he could, but then > >> he should rewrite the whole TokenStream). To be conform to the old > >> API, there must be an end(Token) method, which we will not add. > >> I would drop the old API from this docs. > > > > -- > > This message is automatically generated by JIRA. > > - > > You can reply to this email to add a comment to the issue online. > > > > > > - > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
To add to my argument (and I'm not trying to get you to change this patch by the way - its just javadoc, and I'm more interested in the argument than the results this morning ;) ) You could make a similar argument that links from one method referencing another in the same class are unneeded - you are already at the page. But they nicely scroll you to what you want to see. Same with the class link itself - if you are at the bottom of a long class and click the class link, it takes you to the top and definition of the class - the same way that when I am in next(), I can click a link to get to the increment() definition. - Mark Mark Miller wrote: > That depends - the links may end up in summaries on different pages > (first sentence as an exaple) - it also provides a consistent > formatting for class names so that they pop silmialry everywhere. I > don't agree with "it makes no sense." I'd make every classname > everywhere a link if I could. > > - Mark > > http://www.lucidimagination.com (mobile) > > On Aug 30, 2009, at 6:17 AM, "Uwe Schindler (JIRA)" > wrote: > >> >> [ >> https://issues.apache.org/jira/browse/LUCENE-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> ] >> >> >> Uwe Schindler updated LUCENE-1875: >> -- >> >>Attachment: LUCENE-1875.patch >> >> Patxh with changed end() javadocs. This patch also removes the {...@link >> TokenStream}s inside TokenStream.java (it does not make sense to link >> to the same doc page itsself). >> >>> Javadoc of TokenStream.end() somehow confusing >>> -- >>> >>>Key: LUCENE-1875 >>>URL: https://issues.apache.org/jira/browse/LUCENE-1875 >>>Project: Lucene - Java >>> Issue Type: Bug >>> Components: Analysis >>> Affects Versions: 2.9 >>> Reporter: Uwe Schindler >>> Assignee: Uwe Schindler >>>Fix For: 2.9 >>> >>>Attachments: LUCENE-1875.patch >>> >>> >>> The Javadocs of TokenStream.end() are somehow confusing, because >>> they also refer to the old TokenStream API ("after next() returned >>> null"). But one who implements his TokenStream with the old API >>> cannot make use of the end() feature, as he would not use attributes >>> and so cannot update the end offsets (he could, but then he should >>> rewrite the whole TokenStream). To be conform to the old API, there >>> must be an end(Token) method, which we will not add. >>> I would drop the old API from this docs. >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
I agree that its not necessary - and like I said in the other email - I'm not actually trying to get you to change it (much more important changes to argue for than javadoc), but I still disagree. I like those links ;) In my judgment, I never know what I want as a link ahead of time - I end up using them all over the place - it depends on the situation every time. I like them available though. I suppose in my view, I don't see them as making the javadoc harder to read - I like things linky so I can click on any random occurrence I am focusing on. I know its personal taste though - similar to code formatting. - Mark Uwe Schindler wrote: > I cite the guide from Sun > (http://java.sun.com/j2se/javadoc/writingdoccomments/): > > -- > Use in-line links economically > > You are encouraged to add links for API names (listed immediately above) > using the {...@link} tag. It is not necessary to add links for all API names > in > a doc comment. Because links call attention to themselves (by their color > and underline in HTML, and by their length in source code doc comments), it > can make the comments more difficult to read if used profusely. We therefore > recommend adding a link to an API name if: > > - The user might actually want to click on it for more information (in your > judgment), and > - Only for the first occurrence of each API name in the doc comment (don't > bother repeating a link) > > Our audience is advanced (not novice) programmers, so it is generally not > necessary to link to API in the java.lang package (such as String), or other > API you feel would be well-known. > -- > > The general formatting of class names could be solved by using {...@link ...} > for foreign ones and {...@code ...} for the class name itself. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Sunday, August 30, 2009 5:03 PM >> To: java-dev@lucene.apache.org >> Subject: Re: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() >> somehow confusing >> >> That depends - the links may end up in summaries on different pages >> (first sentence as an exaple) - it also provides a consistent >> formatting for class names so that they pop silmialry everywhere. I >> don't agree with "it makes no sense." I'd make every classname >> everywhere a link if I could. >> >> - Mark >> >> http://www.lucidimagination.com (mobile) >> >> On Aug 30, 2009, at 6:17 AM, "Uwe Schindler (JIRA)" >> wrote: >> >> >>> [ https://issues.apache.org/jira/browse/LUCENE- >>> >> 1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> >>> ] >>> >>> Uwe Schindler updated LUCENE-1875: >>> -- >>> >>>Attachment: LUCENE-1875.patch >>> >>> Patxh with changed end() javadocs. This patch also removes the >>> {...@link TokenStream}s inside TokenStream.java (it does not make sense >>> to link to the same doc page itsself). >>> >>> Javadoc of TokenStream.end() somehow confusing -- Key: LUCENE-1875 URL: https://issues.apache.org/jira/browse/LUCENE-1875 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 2.9 Attachments: LUCENE-1875.patch The Javadocs of TokenStream.end() are somehow confusing, because they also refer to the old TokenStream API ("after next() returned null"). But one who implements his TokenStream with the old API cannot make use of the end() feature, as he would not use attributes and so cannot update the end offsets (he could, but then he should rewrite the whole TokenStream). To be conform to the old API, there must be an end(Token) method, which we will not add. I would drop the old API from this docs. >>> -- >>> This message is automatically generated by JIRA. >>> - >>> You can reply to this email to add a comment to the issue online. >>> >>> >>> - >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mai
RE: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
No problem, see my other mail! I can revert the @link changes. By the way, I noticed shortly, that @code is Java 5 only. So I could replace it by . For me the whole class Javadocs were a little bit over-linkified with links pointing to the same class itself. I only wanted to remove links (as the guide from sun notes), that are somehow pointing to the exact same class the description is about (in the class description). I am a real fan of linking everything, so links between methods is very important! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Sunday, August 30, 2009 5:38 PM > To: java-dev@lucene.apache.org > Subject: Re: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() > somehow confusing > > To add to my argument (and I'm not trying to get you to change this > patch by the way - its just javadoc, and I'm more interested in the > argument than the results this morning ;) ) > > You could make a similar argument that links from one method referencing > another in the same class are unneeded - you are already at the page. > But they nicely scroll you to what you want to see. Same with the class > link itself - if you are at the bottom of a long class and click the > class link, it takes you to the top and definition of the class - the > same way that when I am in next(), I can click a link to get to the > increment() definition. > > - Mark > > Mark Miller wrote: > > That depends - the links may end up in summaries on different pages > > (first sentence as an exaple) - it also provides a consistent > > formatting for class names so that they pop silmialry everywhere. I > > don't agree with "it makes no sense." I'd make every classname > > everywhere a link if I could. > > > > - Mark > > > > http://www.lucidimagination.com (mobile) > > > > On Aug 30, 2009, at 6:17 AM, "Uwe Schindler (JIRA)" > > wrote: > > > >> > >> [ > >> https://issues.apache.org/jira/browse/LUCENE- > 1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] > >> > >> > >> Uwe Schindler updated LUCENE-1875: > >> -- > >> > >>Attachment: LUCENE-1875.patch > >> > >> Patxh with changed end() javadocs. This patch also removes the {...@link > >> TokenStream}s inside TokenStream.java (it does not make sense to link > >> to the same doc page itsself). > >> > >>> Javadoc of TokenStream.end() somehow confusing > >>> -- > >>> > >>>Key: LUCENE-1875 > >>>URL: https://issues.apache.org/jira/browse/LUCENE-1875 > >>>Project: Lucene - Java > >>> Issue Type: Bug > >>> Components: Analysis > >>> Affects Versions: 2.9 > >>> Reporter: Uwe Schindler > >>> Assignee: Uwe Schindler > >>>Fix For: 2.9 > >>> > >>>Attachments: LUCENE-1875.patch > >>> > >>> > >>> The Javadocs of TokenStream.end() are somehow confusing, because > >>> they also refer to the old TokenStream API ("after next() returned > >>> null"). But one who implements his TokenStream with the old API > >>> cannot make use of the end() feature, as he would not use attributes > >>> and so cannot update the end offsets (he could, but then he should > >>> rewrite the whole TokenStream). To be conform to the old API, there > >>> must be an end(Token) method, which we will not add. > >>> I would drop the old API from this docs. > >> > >> -- > >> This message is automatically generated by JIRA. > >> - > >> You can reply to this email to add a comment to the issue online. > >> > >> > >> - > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> > > > -- > - Mark > > http://www.lucidimagination.com > > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
No, no, I wasn't arguing for a revert - more interested in exploring the topic :) If you think its cleaner, you did way more on the TokenStream API stuff than me and I think your javadoc pref on it should outweigh mine. - Mark Uwe Schindler wrote: > No problem, see my other mail! I can revert the @link changes. By the way, I > noticed shortly, that @code is Java 5 only. So I could replace it by > . > > For me the whole class Javadocs were a little bit over-linkified with links > pointing to the same class itself. I only wanted to remove links (as the > guide from sun notes), that are somehow pointing to the exact same class the > description is about (in the class description). > > I am a real fan of linking everything, so links between methods is very > important! > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > >> -Original Message- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Sunday, August 30, 2009 5:38 PM >> To: java-dev@lucene.apache.org >> Subject: Re: [jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() >> somehow confusing >> >> To add to my argument (and I'm not trying to get you to change this >> patch by the way - its just javadoc, and I'm more interested in the >> argument than the results this morning ;) ) >> >> You could make a similar argument that links from one method referencing >> another in the same class are unneeded - you are already at the page. >> But they nicely scroll you to what you want to see. Same with the class >> link itself - if you are at the bottom of a long class and click the >> class link, it takes you to the top and definition of the class - the >> same way that when I am in next(), I can click a link to get to the >> increment() definition. >> >> - Mark >> >> Mark Miller wrote: >> >>> That depends - the links may end up in summaries on different pages >>> (first sentence as an exaple) - it also provides a consistent >>> formatting for class names so that they pop silmialry everywhere. I >>> don't agree with "it makes no sense." I'd make every classname >>> everywhere a link if I could. >>> >>> - Mark >>> >>> http://www.lucidimagination.com (mobile) >>> >>> On Aug 30, 2009, at 6:17 AM, "Uwe Schindler (JIRA)" >>> wrote: >>> >>> [ https://issues.apache.org/jira/browse/LUCENE- >> 1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] >> Uwe Schindler updated LUCENE-1875: -- Attachment: LUCENE-1875.patch Patxh with changed end() javadocs. This patch also removes the {...@link TokenStream}s inside TokenStream.java (it does not make sense to link to the same doc page itsself). > Javadoc of TokenStream.end() somehow confusing > -- > >Key: LUCENE-1875 >URL: https://issues.apache.org/jira/browse/LUCENE-1875 >Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 2.9 > Reporter: Uwe Schindler > Assignee: Uwe Schindler >Fix For: 2.9 > >Attachments: LUCENE-1875.patch > > > The Javadocs of TokenStream.end() are somehow confusing, because > they also refer to the old TokenStream API ("after next() returned > null"). But one who implements his TokenStream with the old API > cannot make use of the end() feature, as he would not use attributes > and so cannot update the end offsets (he could, but then he should > rewrite the whole TokenStream). To be conform to the old API, there > must be an end(Token) method, which we will not add. > I would drop the old API from this docs. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional comm
[jira] Commented: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages
[ https://issues.apache.org/jira/browse/LUCENE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749282#action_12749282 ] Robert Muir commented on LUCENE-1862: - Uwe, i think you are right, but the real problem in this case is that SmartChineseAnalyzer is not in the smartcn package. it is under o.a.l.analysis instead of o.a.l.analysis.smartcn (with all the other smartcn tokenizer/tokenfilters where it should really belong imho) if we put package.html and SmartChineseAnalyzer one level lower, things would make more sense in my opinion. > duplicate package.html files in queryParser and analsysis.cn packages > - > > Key: LUCENE-1862 > URL: https://issues.apache.org/jira/browse/LUCENE-1862 > Project: Lucene - Java > Issue Type: Bug >Reporter: Hoss Man >Priority: Minor > Fix For: 2.9 > > > These files conflict with eachother when building the javadocs. there can be > only one (of each) ... > {code} > hoss...@brunner:~/lucene/java$ find src contrib -name package.html | perl > -ple 's{.*src/java/}{}' | sort | uniq -c | grep -v " 1 " >2 org/apache/lucene/analysis/cn/package.html >2 org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path > \*queryParser/package.html > src/java/org/apache/lucene/queryParser/package.html > contrib/queryparser/src/java/org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path \*cn/package.html > contrib/analyzers/common/src/java/org/apache/lucene/analysis/cn/package.html > contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/package.html > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages
[ https://issues.apache.org/jira/browse/LUCENE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749283#action_12749283 ] Uwe Schindler commented on LUCENE-1862: --- +1 I was wondering why the analyzer is not in the sub-package. I think we can change this even in release phase (as the whole package is experimental...) > duplicate package.html files in queryParser and analsysis.cn packages > - > > Key: LUCENE-1862 > URL: https://issues.apache.org/jira/browse/LUCENE-1862 > Project: Lucene - Java > Issue Type: Bug >Reporter: Hoss Man >Priority: Minor > Fix For: 2.9 > > > These files conflict with eachother when building the javadocs. there can be > only one (of each) ... > {code} > hoss...@brunner:~/lucene/java$ find src contrib -name package.html | perl > -ple 's{.*src/java/}{}' | sort | uniq -c | grep -v " 1 " >2 org/apache/lucene/analysis/cn/package.html >2 org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path > \*queryParser/package.html > src/java/org/apache/lucene/queryParser/package.html > contrib/queryparser/src/java/org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path \*cn/package.html > contrib/analyzers/common/src/java/org/apache/lucene/analysis/cn/package.html > contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/package.html > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
[ https://issues.apache.org/jira/browse/LUCENE-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1875: -- Attachment: LUCENE-1875.patch Replaces {...@code ...} by javadoc 1.4 compatible . Will commit soon. > Javadoc of TokenStream.end() somehow confusing > -- > > Key: LUCENE-1875 > URL: https://issues.apache.org/jira/browse/LUCENE-1875 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1875.patch, LUCENE-1875.patch > > > The Javadocs of TokenStream.end() are somehow confusing, because they also > refer to the old TokenStream API ("after next() returned null"). But one who > implements his TokenStream with the old API cannot make use of the end() > feature, as he would not use attributes and so cannot update the end offsets > (he could, but then he should rewrite the whole TokenStream). To be conform > to the old API, there must be an end(Token) method, which we will not add. > I would drop the old API from this docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages
[ https://issues.apache.org/jira/browse/LUCENE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749285#action_12749285 ] Mark Miller commented on LUCENE-1862: - bq. I think we can change this even in release phase (as the whole package is experimental...) +1 > duplicate package.html files in queryParser and analsysis.cn packages > - > > Key: LUCENE-1862 > URL: https://issues.apache.org/jira/browse/LUCENE-1862 > Project: Lucene - Java > Issue Type: Bug >Reporter: Hoss Man >Priority: Minor > Fix For: 2.9 > > > These files conflict with eachother when building the javadocs. there can be > only one (of each) ... > {code} > hoss...@brunner:~/lucene/java$ find src contrib -name package.html | perl > -ple 's{.*src/java/}{}' | sort | uniq -c | grep -v " 1 " >2 org/apache/lucene/analysis/cn/package.html >2 org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path > \*queryParser/package.html > src/java/org/apache/lucene/queryParser/package.html > contrib/queryparser/src/java/org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path \*cn/package.html > contrib/analyzers/common/src/java/org/apache/lucene/analysis/cn/package.html > contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/package.html > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1865) Add a ton of missing license headers throughout test/demo/contrib
[ https://issues.apache.org/jira/browse/LUCENE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1865. - Resolution: Fixed Thanks for finishing this. > Add a ton of missing license headers throughout test/demo/contrib > - > > Key: LUCENE-1865 > URL: https://issues.apache.org/jira/browse/LUCENE-1865 > Project: Lucene - Java > Issue Type: Task >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1865-part2.patch, LUCENE-1865.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1875) Javadoc of TokenStream.end() somehow confusing
[ https://issues.apache.org/jira/browse/LUCENE-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1875. --- Resolution: Fixed Committed revision: 809381 > Javadoc of TokenStream.end() somehow confusing > -- > > Key: LUCENE-1875 > URL: https://issues.apache.org/jira/browse/LUCENE-1875 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1875.patch, LUCENE-1875.patch > > > The Javadocs of TokenStream.end() are somehow confusing, because they also > refer to the old TokenStream API ("after next() returned null"). But one who > implements his TokenStream with the old API cannot make use of the end() > feature, as he would not use attributes and so cannot update the end offsets > (he could, but then he should rewrite the whole TokenStream). To be conform > to the old API, there must be an end(Token) method, which we will not add. > I would drop the old API from this docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1876) Some contrib packages are missing a package.html
Some contrib packages are missing a package.html Key: LUCENE-1876 URL: https://issues.apache.org/jira/browse/LUCENE-1876 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Reporter: Mark Miller Priority: Trivial Fix For: 2.9 Dunno if we will get to this one this release, but a few contribs don't have a package.html (or a good overview that would work as a replacement) - I don't think this is hugely important, but I think it is important - you should be able to easily and quickly read a quick overview for each contrib I think. So far I have identified collation and spatial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1876) Some contrib packages are missing a package.html
[ https://issues.apache.org/jira/browse/LUCENE-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749291#action_12749291 ] Mark Miller commented on LUCENE-1876: - also db and remote > Some contrib packages are missing a package.html > > > Key: LUCENE-1876 > URL: https://issues.apache.org/jira/browse/LUCENE-1876 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > Dunno if we will get to this one this release, but a few contribs don't have > a package.html (or a good overview that would work as a replacement) - I > don't think this is hugely important, but I think it is important - you > should be able to easily and quickly read a quick overview for each contrib I > think. > So far I have identified collation and spatial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749292#action_12749292 ] Uwe Schindler commented on LUCENE-1855: --- As I already started to move Lucene to Java 1.5, here an idea, how to get the generics in: I would propose to switch on all "unchecked" warnings (as in the supplied patch) by the in ant. The problem is then, that there are hundreds of warnings printed out. I would like to then add a @SuppressWarnings("unchecked") to all classes, that are not yet rewritten to generics for collections and other generified java things (like Class in AttributeSource). The warnings should then disappear. We could then start to search for SuppressWarnings annotations in the source and start the classes one-by-one and add generics. By this it is simplier, because you only get warnings for the class you are working on. What do you think? > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1862) duplicate package.html files in queryParser and analsysis.cn packages
[ https://issues.apache.org/jira/browse/LUCENE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749293#action_12749293 ] Mark Miller commented on LUCENE-1862: - Just as an expansion on my feelings about any changes: I posted the rules that are suggested on the wiki - but I think they are a bit harsh (eg only serious bug fixes). My thought is that the important part is : {quote}Keep in mind that it is our main intention to keep the branch as stable as possible.{quote} I think that anything is fair game as long as its clear it will not affect stability. If everybody thinks something is a good idea, and they don't think it has the reach to affect stability (or undercut the testing that has already occured), I don't see why we wouldn't do it. As long as its discussed first and given a bit of time to ensure consensus. > duplicate package.html files in queryParser and analsysis.cn packages > - > > Key: LUCENE-1862 > URL: https://issues.apache.org/jira/browse/LUCENE-1862 > Project: Lucene - Java > Issue Type: Bug >Reporter: Hoss Man >Priority: Minor > Fix For: 2.9 > > > These files conflict with eachother when building the javadocs. there can be > only one (of each) ... > {code} > hoss...@brunner:~/lucene/java$ find src contrib -name package.html | perl > -ple 's{.*src/java/}{}' | sort | uniq -c | grep -v " 1 " >2 org/apache/lucene/analysis/cn/package.html >2 org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path > \*queryParser/package.html > src/java/org/apache/lucene/queryParser/package.html > contrib/queryparser/src/java/org/apache/lucene/queryParser/package.html > hoss...@brunner:~/lucene/java$ find src contrib -path \*cn/package.html > contrib/analyzers/common/src/java/org/apache/lucene/analysis/cn/package.html > contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/package.html > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749294#action_12749294 ] Uwe Schindler commented on LUCENE-1855: --- About the backwards compatibility: After adding generics, the backwards test should run all tests from 2.9 (compiled with 1.4) against the generified trunk jar. After branching I would start to manage this. Normally generics do not bring backwards incompatibility, because they are simple removed. You only have probloems at places, where the the erased generics should not be replaced by java.lang.Object. Eg in the AttributeSource call, it should return a Attribute and not Object (because of this you need to generify the whole method by the generics prefix, defining "A" as subclass of "Attribute"). If you use an naive approach to add generics, it could lead to addAttribute returns Object (and so a link error would occur). To prevent this, the backwards tests against 2.9 are a good solution. By the way, the generics for AttributeSource were copied from j.l.Class and its methods to get annotations :) > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749294#action_12749294 ] Uwe Schindler edited comment on LUCENE-1855 at 8/30/09 10:49 AM: - About the backwards compatibility: After adding generics, the backwards test should run all tests from 2.9 (compiled with 1.4) against the generified trunk jar. After branching I would start to manage this. Normally generics do not bring backwards incompatibility, because they are simple removed. You only have probloems at places, where the the erased generics should not be replaced by java.lang.Object. Eg in the AttributeSource.addAttribute() call, it should return a Attribute (subclass) as in 2.9 and not Object (because of this you need to generify the whole method by the generics prefix, defining "A" as subclass of "Attribute"). If you use an naive approach to add generics, it could lead to addAttribute returns Object (and so a link error would occur). To prevent this, the backwards tests against 2.9 are a good solution. By the way, the generics for AttributeSource were copied from j.l.Class and its methods to get annotations :) was (Author: thetaphi): About the backwards compatibility: After adding generics, the backwards test should run all tests from 2.9 (compiled with 1.4) against the generified trunk jar. After branching I would start to manage this. Normally generics do not bring backwards incompatibility, because they are simple removed. You only have probloems at places, where the the erased generics should not be replaced by java.lang.Object. Eg in the AttributeSource call, it should return a Attribute and not Object (because of this you need to generify the whole method by the generics prefix, defining "A" as subclass of "Attribute"). If you use an naive approach to add generics, it could lead to addAttribute returns Object (and so a link error would occur). To prevent this, the backwards tests against 2.9 are a good solution. By the way, the generics for AttributeSource were copied from j.l.Class and its methods to get annotations :) > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1877) Improve IndexWriter javadoc on locking
Improve IndexWriter javadoc on locking -- Key: LUCENE-1877 URL: https://issues.apache.org/jira/browse/LUCENE-1877 Project: Lucene - Java Issue Type: Improvement Components: Javadocs Reporter: Mark Miller Priority: Trivial Fix For: 2.9 A user requested we add a note in IndexWriter alerting the availability of NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm exit). Seems reasonable to me - we want users to be able to easily stumble upon this class. The below code looks like a good spot to add a note - could also improve whats there a bit - opening an IndexWriter does not necessarily create a lock file - that would depend on the LockFactory used. {code} Opening an IndexWriter creates a lock file for the directory in use. Trying to open another IndexWriter on the same directory will lead to a {...@link LockObtainFailedException}. The {...@link LockObtainFailedException} is also thrown if an IndexReader on the same directory is used to delete documents from the index.{code} Anyone remember why NativeFSLockFactory is not the default over SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1877) Improve IndexWriter javadoc on locking
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749301#action_12749301 ] Uwe Schindler commented on LUCENE-1877: --- For IndexWriter/IndexReader this hint is no longer needed (in Lucene 2.9), as all methods taking String/File instead of Directory are deprecated and users should create directory instances and then will automatically get to the place where the LockFactory can be supplied. The note should be added to FSDirectory instead. > Improve IndexWriter javadoc on locking > -- > > Key: LUCENE-1877 > URL: https://issues.apache.org/jira/browse/LUCENE-1877 > Project: Lucene - Java > Issue Type: Improvement > Components: Javadocs >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > A user requested we add a note in IndexWriter alerting the availability of > NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm > exit). Seems reasonable to me - we want users to be able to easily stumble > upon this class. The below code looks like a good spot to add a note - could > also improve whats there a bit - opening an IndexWriter does not necessarily > create a lock file - that would depend on the LockFactory used. > {code} Opening an IndexWriter creates a lock file for the > directory in use. Trying to open > another IndexWriter on the same directory will lead to a > {...@link LockObtainFailedException}. The {...@link > LockObtainFailedException} > is also thrown if an IndexReader on the same directory is used to delete > documents > from the index.{code} > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749302#action_12749302 ] Mark Miller commented on LUCENE-1855: - bq. By the way, the generics for AttributeSource were copied from j.l.Class and its methods to get annotations >From the apache harmony project I hope ;) > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1877) Improve IndexWriter javadoc on locking
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749303#action_12749303 ] Mark Miller commented on LUCENE-1877: - My initial thought was also that it didn't really belong in IndexWriter - but I sold myself on the fact that IndexWriter talks about locking and offers the force unlock method - so it seems fine to me to mention how to use the optimal locking factory (and generally avoid using the force unlock at all - as an aside I just saw a guy trying to use that the other day as regular code so that they could use two IndexWriters with just commit rather than close - ugg). I'm not sold either way though - I'd go with whatever. My preference would really be to make it the default. > Improve IndexWriter javadoc on locking > -- > > Key: LUCENE-1877 > URL: https://issues.apache.org/jira/browse/LUCENE-1877 > Project: Lucene - Java > Issue Type: Improvement > Components: Javadocs >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > A user requested we add a note in IndexWriter alerting the availability of > NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm > exit). Seems reasonable to me - we want users to be able to easily stumble > upon this class. The below code looks like a good spot to add a note - could > also improve whats there a bit - opening an IndexWriter does not necessarily > create a lock file - that would depend on the LockFactory used. > {code} Opening an IndexWriter creates a lock file for the > directory in use. Trying to open > another IndexWriter on the same directory will lead to a > {...@link LockObtainFailedException}. The {...@link > LockObtainFailedException} > is also thrown if an IndexReader on the same directory is used to delete > documents > from the index.{code} > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1877) Improve IndexWriter javadoc on locking
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749303#action_12749303 ] Mark Miller edited comment on LUCENE-1877 at 8/30/09 11:53 AM: --- My initial thought was also that it didn't really belong in IndexWriter - but I sold myself on the fact that IndexWriter talks about locking and offers the force unlock method - so it seems fine to me to mention how to use the optimal locking factory (and generally avoid using the force unlock at all - as an aside I just saw a guy trying to use that the other day as regular code so that they could use two IndexWriters with just commit rather than close - ugg). I'm not sold either way though - I'd go with whatever. My preference would really be to make it the default (though of course not for 2.9). was (Author: markrmil...@gmail.com): My initial thought was also that it didn't really belong in IndexWriter - but I sold myself on the fact that IndexWriter talks about locking and offers the force unlock method - so it seems fine to me to mention how to use the optimal locking factory (and generally avoid using the force unlock at all - as an aside I just saw a guy trying to use that the other day as regular code so that they could use two IndexWriters with just commit rather than close - ugg). I'm not sold either way though - I'd go with whatever. My preference would really be to make it the default. > Improve IndexWriter javadoc on locking > -- > > Key: LUCENE-1877 > URL: https://issues.apache.org/jira/browse/LUCENE-1877 > Project: Lucene - Java > Issue Type: Improvement > Components: Javadocs >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > A user requested we add a note in IndexWriter alerting the availability of > NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm > exit). Seems reasonable to me - we want users to be able to easily stumble > upon this class. The below code looks like a good spot to add a note - could > also improve whats there a bit - opening an IndexWriter does not necessarily > create a lock file - that would depend on the LockFactory used. > {code} Opening an IndexWriter creates a lock file for the > directory in use. Trying to open > another IndexWriter on the same directory will lead to a > {...@link LockObtainFailedException}. The {...@link > LockObtainFailedException} > is also thrown if an IndexReader on the same directory is used to delete > documents > from the index.{code} > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749305#action_12749305 ] Michael Busch commented on LUCENE-1855: --- {quote} >From the apache harmony project I hope {quote} No worries... the original patch (LUCENE-1422) had the 1.5 version attached as comments all the time and wasn't copied from anywhere. I originally designed the API with 1.5 generics and ported it back to 1.4 to be able to commit it. We just removed those comments with some patch from trunk (probably LUCENE-1693) and Uwe brought it back now. > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Deprecated class in spatial contrib
The spatial contrib has not been in a release before, so just wondering why there are deprecated classes in it - should we remove those, or was there a good reason to keep them? In general, it seem we should just deprecate whats been in a release, and change otherwise? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749306#action_12749306 ] Uwe Schindler commented on LUCENE-1855: --- bq. From the apache harmony project I hope No problem at all, it was more only this public API line that *inspired* me... I did not copy any code, only the public API: {code} public A getAnnotation(Class annotationClass) {code} from [http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Class.html#getAnnotation(java.lang.Class)]. Michael: This was not in the comments (at least not in the comments from the latest Lucene code before our rewrite). > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749307#action_12749307 ] Mark Miller commented on LUCENE-1855: - no worries here either guys - far from the copyright police here - just a wink comment. > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1873) Update site lucene-sandbox page
[ https://issues.apache.org/jira/browse/LUCENE-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749308#action_12749308 ] Mark Miller commented on LUCENE-1873: - Here is a rough draft - if you have any comments or suggestions, I'd be happy to take them. If you originally wrote a contrib or know it well, feel free to provide a better overview. I'll probably make one more pass myself at some point. I'll also include a link to the package.html (for the contribs that have it) analyzers Contributed Analyzers, Tokenizers, and Filters for various uses and languages. ant Ant task to create Lucene indexes. benchmark The benchmark contribution contains tools for benchmarking Lucene using standard, freely available corpora. collation CollationKeyFilter/Analyzer and ICUCollationKeyFilter/Analyzer. db Provdies integration with berkely db. highlighter A set of classes for highlighting matching terms in search results. fast-vector-highlighter An alternative set of classes for highlighting matching terms in search results that relies on stored term vectors. instantiated RAM-based index that enables much faster searching than RAMDirectory in certain situations. lucli An application that allows Lucene index manipulation from the command-line. memory High-performance single-document main memory index. misc A variety of miscellaenous files, including QueryParsers, and other alternate Lucene class implementations and tools. queryparser A new Lucene query parser implementation, which matches the syntax of the core QueryParser but offers a more modular architecture to enable customization. regex Queries with additional regex mactching capabilities. remote Classes to help use Lucene with RMI. snowball Pre-compiled versions of the Snowball stemmers for Lucene. spatial Classes to help with efficient distance based sorting. spellchecker Provides tools for spellchecking and suggestions with Lucene. surround A QueryParser that also supports the Span family of queries. swing Swing componenets designed to integrate with Lucene. wikipedia Tools for working with wikipedia content. wordnet Tools to help utilize wordnet synonyms with Lucene xml-query-parser A QueryParser that can read queries written in an XML format. > Update site lucene-sandbox page > --- > > Key: LUCENE-1873 > URL: https://issues.apache.org/jira/browse/LUCENE-1873 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > The page has misleading/bad info. One thing I would like to do - but I won't > attempt now (prob good for the modules issue) - is commit to one word - > contrib or sandbox. I think sandbox should be purged myself. > The current page says that the sandbox is kind of a rats nest with various > early stage software that one day may make it into core - that info is > outdated I think. We should replace it, and also specify how the back compat > policy works in contrib eg each contrib can have its own policy, with the > default being no policy. > We should also drop the piece about being open to Lucene's committers and > others - a bit outdated. > We should also either include the other contribs, or change the wording to > indicate that the list is only a sampling of the many contribs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749312#action_12749312 ] Michael Busch commented on LUCENE-1855: --- {quote} Michael: This was not in the comments (at least not in the comments from the latest Lucene code before our rewrite). {quote} It was. The LUCENE-1693 commit removed it. And my first 1693 patch left it in, so either me or you removed it in the subsequent 1693 patches. It's not really important I guess :) As you and Mark said: it's just public API stuff that we could have gotten from the javadocs anyway. > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Deprecated class in spatial contrib
+1 How obvious!! On Aug 30, 2009, at 3:04 PM, Mark Miller wrote: The spatial contrib has not been in a release before, so just wondering why there are deprecated classes in it - should we remove those, or was there a good reason to keep them? In general, it seem we should just deprecate whats been in a release, and change otherwise? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Deprecated class in spatial contrib
+1, we should remove them. Mike On Sun, Aug 30, 2009 at 3:04 PM, Mark Miller wrote: > The spatial contrib has not been in a release before, so just wondering > why there are deprecated classes in it - should we remove those, or was > there a good reason to keep them? In general, it seem we should just > deprecate whats been in a release, and change otherwise? > > -- > - Mark > > http://www.lucidimagination.com > > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1877) Improve IndexWriter javadoc on locking
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749322#action_12749322 ] Michael McCandless commented on LUCENE-1877: bq. Anyone remember why NativeFSLockFactory is not the default over SimpleFSLockFactory? In my testing (long ago) over NFS, I actually found "native" locks didn't work as well as "simple" locks. I was also a bit nervous on how well supported "native" locks are across different OSs. bq. My preference would really be to make it the default (though of course not for 2.9). +1, I think it's the better default. People who use Lucene over NFS already have to do special things (eg make a custom deletion policy), and far too many users hit the "leftover lock file" problem. We could state in the javadocs that this default will change in 3.0? Maybe just add one sentence in that IndexWriter locking section, referencing the discussion in NativeFSLockFactory's javadocs about not having the "leftover lock file" problem? > Improve IndexWriter javadoc on locking > -- > > Key: LUCENE-1877 > URL: https://issues.apache.org/jira/browse/LUCENE-1877 > Project: Lucene - Java > Issue Type: Improvement > Components: Javadocs >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > A user requested we add a note in IndexWriter alerting the availability of > NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm > exit). Seems reasonable to me - we want users to be able to easily stumble > upon this class. The below code looks like a good spot to add a note - could > also improve whats there a bit - opening an IndexWriter does not necessarily > create a lock file - that would depend on the LockFactory used. > {code} Opening an IndexWriter creates a lock file for the > directory in use. Trying to open > another IndexWriter on the same directory will lead to a > {...@link LockObtainFailedException}. The {...@link > LockObtainFailedException} > is also thrown if an IndexReader on the same directory is used to delete > documents > from the index.{code} > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1877) Improve IndexWriter javadoc on locking
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749324#action_12749324 ] Uwe Schindler commented on LUCENE-1877: --- Let's do it in the following way: - deprecated FSDir.getDirectory() methods return the SimpleLockFactory, as it was before. - The new FSDir.open() methods and also the direct ctors of SimpleFSDir, MMapFSDir, NIOFSDir default to NativeLocakFactory (these ctors/methods are all new in 2.9) Because of this we have no BW problem. > Improve IndexWriter javadoc on locking > -- > > Key: LUCENE-1877 > URL: https://issues.apache.org/jira/browse/LUCENE-1877 > Project: Lucene - Java > Issue Type: Improvement > Components: Javadocs >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > A user requested we add a note in IndexWriter alerting the availability of > NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm > exit). Seems reasonable to me - we want users to be able to easily stumble > upon this class. The below code looks like a good spot to add a note - could > also improve whats there a bit - opening an IndexWriter does not necessarily > create a lock file - that would depend on the LockFactory used. > {code} Opening an IndexWriter creates a lock file for the > directory in use. Trying to open > another IndexWriter on the same directory will lead to a > {...@link LockObtainFailedException}. The {...@link > LockObtainFailedException} > is also thrown if an IndexReader on the same directory is used to delete > documents > from the index.{code} > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1855: -- Attachment: LUCENE-1855.patch bq. It was. OK. I found it :-) - in the first version after checkin. Attached is a new patch, now also making TokenStream generics activated. The next step is to convert all Tokenizers (as always...). Michael: Is this patch, how you want to have it? > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch, LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1855) Change AttributeSource API to use generics
[ https://issues.apache.org/jira/browse/LUCENE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749328#action_12749328 ] Michael Busch commented on LUCENE-1855: --- Uwe, I'm not home right now, will look tonight! Thanks for writing the patch! > Change AttributeSource API to use generics > -- > > Key: LUCENE-1855 > URL: https://issues.apache.org/jira/browse/LUCENE-1855 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: LUCENE-1855.patch, LUCENE-1855.patch > > > The AttributeSource API will be easier to use with JDK 1.5 generics. > Uwe, if you started working on a patch for this already feel free to assign > this to you. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1877) Improve IndexWriter javadoc on locking
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749330#action_12749330 ] Marvin Humphrey commented on LUCENE-1877: - > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? Wasn't it because native locking is somethings implemented with Fcntl, and Fcntl locking blows chunks? Especially for a library rather than an application. >From the BSD manpage on Fcntl: {quote} This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process. This semantic means that applications must be aware of any files that a subroutine library may access. For example if an application for updating the password file locks the password file database while making the update, and then calls getpwname(3) to retrieve a record, the lock will be lost because getpwname(3) opens, reads, and closes the password database. The database close will release all locks that the process has associated with the database, even if the library routine never requested a lock on the database. Another minor semantic problem with this interface is that locks are not inherited by a child process created using the fork(2) function. The flock(2) interface has much more rational last close semantics and allows locks to be inherited by child processes. Flock(2) is recommended for applications that want to ensure the integrity of their locks when using library routines or wish to pass locks to their children... {quote} The lockfile may be annoying, but at least it's guaranteed safe on all non-shared volumes when the OS implements atomic file opening. Are you folks at least able to clean up orphaned lockfiles if the PID it was created under is no longer active? > Improve IndexWriter javadoc on locking > -- > > Key: LUCENE-1877 > URL: https://issues.apache.org/jira/browse/LUCENE-1877 > Project: Lucene - Java > Issue Type: Improvement > Components: Javadocs >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > A user requested we add a note in IndexWriter alerting the availability of > NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm > exit). Seems reasonable to me - we want users to be able to easily stumble > upon this class. The below code looks like a good spot to add a note - could > also improve whats there a bit - opening an IndexWriter does not necessarily > create a lock file - that would depend on the LockFactory used. > {code} Opening an IndexWriter creates a lock file for the > directory in use. Trying to open > another IndexWriter on the same directory will lead to a > {...@link LockObtainFailedException}. The {...@link > LockObtainFailedException} > is also thrown if an IndexReader on the same directory is used to delete > documents > from the index.{code} > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1877) Improve IndexWriter javadoc on locking
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749334#action_12749334 ] Mark Miller commented on LUCENE-1877: - {quote}This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process. This semantic means that applications must be aware of any files that a subroutine library may access. For example if an application for updating the password file locks the password file database while making the update, and then calls getpwname(3) to retrieve a record, the lock will be lost because getpwname(3) opens, reads, and closes the password database. The database close will release all locks that the process has associated with the database, even if the library routine never requested a lock on the database. Another minor semantic problem with this interface is that locks are not inherited by a child process created using the fork(2) function. The flock(2) interface has much more rational last close semantics and allows locks to be inherited by child processes. Flock(2) is recommended for applications that want to ensure the integrity of their locks when using library routines or wish to pass locks to their children... {quote} I can see how this is not ideal, but I'm not seeing how any of the mentioned issues apply to our simple lock usage ... > Improve IndexWriter javadoc on locking > -- > > Key: LUCENE-1877 > URL: https://issues.apache.org/jira/browse/LUCENE-1877 > Project: Lucene - Java > Issue Type: Improvement > Components: Javadocs >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > A user requested we add a note in IndexWriter alerting the availability of > NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm > exit). Seems reasonable to me - we want users to be able to easily stumble > upon this class. The below code looks like a good spot to add a note - could > also improve whats there a bit - opening an IndexWriter does not necessarily > create a lock file - that would depend on the LockFactory used. > {code} Opening an IndexWriter creates a lock file for the > directory in use. Trying to open > another IndexWriter on the same directory will lead to a > {...@link LockObtainFailedException}. The {...@link > LockObtainFailedException} > is also thrown if an IndexReader on the same directory is used to delete > documents > from the index.{code} > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1877) Improve IndexWriter javadoc on locking
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749336#action_12749336 ] Mark Miller commented on LUCENE-1877: - bq. People who use Lucene over NFS already have to do special things (eg make a custom deletion policy), and far too many users hit the "leftover lock file" problem. We could state in the javadocs that this default will change in 3.0? +1 from me - if it made things work out of the box with NFS, I'd vote to keep as is, but the points you mention were in my head too. My only worry is current users counting on this default for NFS - but if we put it in the back compat break section (a break in regards to NFS anyway), that should be sufficient warning? > Improve IndexWriter javadoc on locking > -- > > Key: LUCENE-1877 > URL: https://issues.apache.org/jira/browse/LUCENE-1877 > Project: Lucene - Java > Issue Type: Improvement > Components: Javadocs >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > A user requested we add a note in IndexWriter alerting the availability of > NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm > exit). Seems reasonable to me - we want users to be able to easily stumble > upon this class. The below code looks like a good spot to add a note - could > also improve whats there a bit - opening an IndexWriter does not necessarily > create a lock file - that would depend on the LockFactory used. > {code} Opening an IndexWriter creates a lock file for the > directory in use. Trying to open > another IndexWriter on the same directory will lead to a > {...@link LockObtainFailedException}. The {...@link > LockObtainFailedException} > is also thrown if an IndexReader on the same directory is used to delete > documents > from the index.{code} > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Lucene release announcement
I started trying out a list of sample features in the RC release emails, but its really just a quick hack I pulled out. If anyone has any suggestions of what they would like to see incorporated into the final release feature list, let me know. Not the most import thing in the world I know, but as a lover of a good feature/changes list, I'd like the sample features to lure you in :) One large omission that I will be adding: spatial contrib If you have any suggestions/comments, please attach to this thread. Any corrections as well - sometimes its hard to summarize some of this stuff I've never really looked that hard at. To be honest, its hard to summarize the stuff I have looked a lot at sometimes :) -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1873) Update site lucene-sandbox page
[ https://issues.apache.org/jira/browse/LUCENE-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1873: Attachment: LUCENE-1873.patch first rough draft patch > Update site lucene-sandbox page > --- > > Key: LUCENE-1873 > URL: https://issues.apache.org/jira/browse/LUCENE-1873 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > Attachments: LUCENE-1873.patch > > > The page has misleading/bad info. One thing I would like to do - but I won't > attempt now (prob good for the modules issue) - is commit to one word - > contrib or sandbox. I think sandbox should be purged myself. > The current page says that the sandbox is kind of a rats nest with various > early stage software that one day may make it into core - that info is > outdated I think. We should replace it, and also specify how the back compat > policy works in contrib eg each contrib can have its own policy, with the > default being no policy. > We should also drop the piece about being open to Lucene's committers and > others - a bit outdated. > We should also either include the other contribs, or change the wording to > indicate that the list is only a sampling of the many contribs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1878) remove deprecated classes from spatial
remove deprecated classes from spatial -- Key: LUCENE-1878 URL: https://issues.apache.org/jira/browse/LUCENE-1878 Project: Lucene - Java Issue Type: Task Components: contrib/spatial Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 2.9 spatial has not been released, so we can remove the deprecated classes -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1878) remove deprecated classes from spatial
[ https://issues.apache.org/jira/browse/LUCENE-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1878: Attachment: LUCENE-1878.patch removes the deprecated classes - there were two public fields declared of a deprecated type - they were not used, so they are pulled in this patch. Spatial could really use some love - its not in the best shape for its debut - there is no good overview, no package.html to explaining anything, and extremely sparse javadoc. Its in less than great shape for users and maintenance I think. You can look at the tests for a little help though. But for someone that just even wants to know if they would want to use it... > remove deprecated classes from spatial > -- > > Key: LUCENE-1878 > URL: https://issues.apache.org/jira/browse/LUCENE-1878 > Project: Lucene - Java > Issue Type: Task > Components: contrib/spatial >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1878.patch > > > spatial has not been released, so we can remove the deprecated classes -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1874) Further updates to the site scoring page
[ https://issues.apache.org/jira/browse/LUCENE-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1874: Attachment: LUCENE-1874.patch removes the outdated and empty section > Further updates to the site scoring page > > > Key: LUCENE-1874 > URL: https://issues.apache.org/jira/browse/LUCENE-1874 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1874.patch > > > update the site scoring page - see Appendix: > {quote} > Class Diagrams > Karl Wettin's UML on the Wiki > {quote} > Karl's diagrams are outdated - I think this link should be pulled for 2.9 > {quote} > Sequence Diagrams > FILL IN HERE. Volunteers? > {quote} > I think this should be pulled - I say put something like this as a task in > JIRA - not the published site docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1877) Improve IndexWriter javadoc on locking
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749363#action_12749363 ] Marvin Humphrey commented on LUCENE-1877: - > I can see how this is not ideal, but I'm not seeing how any of the > mentioned issues apply to our simple lock usage ... "Simple lock usage"?! You must have a bigger brain than me... As a matter of fact, I think you're right. Fcntl locks have two major drawbacks, and upon review I think NativeFSLockFactory avoids both of them. The first is that opening and closing a file releases all locks for the entire process. Even if you never request a lock on the second filehandle, or if you request a lock and the request is denied, closing the filehandle releases the lock on the first filehandle. But NativeFSLockFactory avoids that problem by keeping the HashSet of filepaths and ensuring that the same file is never opened twice. Furthermore, since the lockfiles are private to Lucene, you can assume that nobody else is going to open them and inadvertently spoil the lock. The second is that child processes spawned via fork() do not inherit locks from the parent process. If you assume that nobody's ever going to fork a Java process, that's not relevant. (Too bad that won't work for Lucy... we have to support fork().) I think you're probably safe with Fcntl locks on all non-shared volumes. > Improve IndexWriter javadoc on locking > -- > > Key: LUCENE-1877 > URL: https://issues.apache.org/jira/browse/LUCENE-1877 > Project: Lucene - Java > Issue Type: Improvement > Components: Javadocs >Reporter: Mark Miller >Priority: Trivial > Fix For: 2.9 > > > A user requested we add a note in IndexWriter alerting the availability of > NativeFSLockFactory (allowing you to avoid retaining locks on abnormal jvm > exit). Seems reasonable to me - we want users to be able to easily stumble > upon this class. The below code looks like a good spot to add a note - could > also improve whats there a bit - opening an IndexWriter does not necessarily > create a lock file - that would depend on the LockFactory used. > {code} Opening an IndexWriter creates a lock file for the > directory in use. Trying to open > another IndexWriter on the same directory will lead to a > {...@link LockObtainFailedException}. The {...@link > LockObtainFailedException} > is also thrown if an IndexReader on the same directory is used to delete > documents > from the index.{code} > Anyone remember why NativeFSLockFactory is not the default over > SimpleFSLockFactory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Porting Java Lucene 2.9 to Lucene.Net (was: RE: Lucene 2.9 RC2 now available for testing)
: My question is, I would prefer to track SVN commits to keep track of : changes, vs. what I'm doing now. This will allow us to stay weeks : behind a Java release vs. months or years as it is now. However, while : I'm subscribed to SVN's commits mailing list, I'm not getting all those : commits! For example, a commit made this past Friday, I never got an : email for, while other commits I do. Any idea what maybe going on? i suggest you track things based on a combination of svn base url (ie: trunk vs a branch) and the specific svn revision number at the moment of your latest checkout -- that way you don't even need to subscribe to the commit list, just do an "svn diff -r" whenever you have some time to work on it and see what's been committed since the last time you worked on it. Hell: you could probably script all of this and have hudson do it for you. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1853) PhraseQuery Scorer for scoring sub phrase matches
[ https://issues.apache.org/jira/browse/LUCENE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Preetam Rao updated LUCENE-1853: Attachment: LUCENE-1853.patch Remove the dependency on PhraseQuery. Create a new Query called "SubPhraseQuery". Created a new patch with seperate new source files, without any changes to existing files. > PhraseQuery Scorer for scoring sub phrase matches > - > > Key: LUCENE-1853 > URL: https://issues.apache.org/jira/browse/LUCENE-1853 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Environment: Lucene/Java >Reporter: Preetam Rao >Priority: Minor > Attachments: LUCENE-1853.patch, LUCENE-1853.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > For a query like "homes in new york with swimming pool", if a document's > field matches only "new york" it should get scored and it should get scored > higher than two separate matches "new" and "york". Also, a 3 word sub phrase > match must gets scored considerably higher than a 2 word sub phrase match. > (boost factor should be configurable) > If a user query is taken as is without parsing and is searched against > multiple fields, where each sub-phrase can match against a different field, > this kind of query is useful. > Using shingles for this use case, means each field of each document needs to > be indexed as shingles of all (1..N)-grams as well as the query. (Please > correct me if I am wrong.) > The scorer could also support > - ignoring of idf and/or field norms, (so that factors outside the document > don't influence scoring) > - consider only the longest match (for example match on "new york" is scored > and considered rather than "new" furniture and "york" city) > - ignore duplicates ("new york" appearing twice or thrice does not make any > difference) > This kind of query (Phrase Query with SubPhraseScorer) could be combined with > DisMax query. For example, something like solr's dismax request handler can > be made to use this query where we run a user query as it is against all > fields and configure each field with above configurations. > I have also attached a patch with comments and test cases in case, my > description is not clear enough. Would appreciate alternatives or feedback. > The goal is to give more control via configuration when searching using user > entered queries against multiple fields where sub phrases have special > significance. > Example Usage: > >// sub phrase config > PhraseQuery.SubPhraseConfig conf = new PhraseQuery.SubPhraseConfig(); > conf.ignoreIdf = true; > conf.ignoreFieldNorms = true; > conf.matchOnlyLongest = true; > conf.ignoreDuplicates = true; > conf.phraseBoost = 2; > // phrase query as usual >PhraseQuery pq = new PhraseQuery(); >pq.add(new Term("f", term)); >pq.add(new Term("f", term)); > pq.setSubPhraseConf(conf); > Hits hits = searcher.search(pq); > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1853) SubPhraseQuery for matching and scoring sub phrase matches.
[ https://issues.apache.org/jira/browse/LUCENE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Preetam Rao updated LUCENE-1853: Description: The goal is to give more control via configuration when searching using user entered queries against multiple fields where sub phrases have special significance. For a query like "homes in new york with swimming pool", if a document's field matches only "new york" it should get scored and it should get scored higher than two separate matches "new" and "york". Also, a 3 word sub phrase match must gets scored considerably higher than a 2 word sub phrase match. (boost factor should be configurable) Using shingles for this use case, means each field of each document needs to be indexed as shingles of all (1..N)-grams as well as the query. (Please correct me if I am wrong.) The query could also support - ignoring of idf and/or field norms, (so that factors outside the document don't influence scoring) - consider only the longest match (for example match on "new york" is scored and considered rather than "new" furniture and "york" city) - ignore duplicates ("new york" appearing twice or thrice does not make any difference) This kind of query could be combined with DisMax query. For example, something like solr's dismax request handler can be made to use this query where we run a user query as it is against all fields and configure each field with above configurations. I have also attached a patch with comments and test cases in case, my description is not clear enough. Would appreciate alternatives or feedback. Example Usage: // sub phrase config SubPhraseQuery.SubPhraseConfig conf = new SubPhraseQuery.SubPhraseConfig(); conf.ignoreIdf = true; conf.ignoreFieldNorms = true; conf.matchOnlyLongest = true; conf.ignoreDuplicates = true; conf.phraseBoost = 2; // phrase query as usual SubPhraseQuery pq = new SubPhraseQuery(); pq.add(new Term("f", term)); pq.add(new Term("f", term)); pq.setSubPhraseConf(conf); Hits hits = searcher.search(pq); was: For a query like "homes in new york with swimming pool", if a document's field matches only "new york" it should get scored and it should get scored higher than two separate matches "new" and "york". Also, a 3 word sub phrase match must gets scored considerably higher than a 2 word sub phrase match. (boost factor should be configurable) If a user query is taken as is without parsing and is searched against multiple fields, where each sub-phrase can match against a different field, this kind of query is useful. Using shingles for this use case, means each field of each document needs to be indexed as shingles of all (1..N)-grams as well as the query. (Please correct me if I am wrong.) The scorer could also support - ignoring of idf and/or field norms, (so that factors outside the document don't influence scoring) - consider only the longest match (for example match on "new york" is scored and considered rather than "new" furniture and "york" city) - ignore duplicates ("new york" appearing twice or thrice does not make any difference) This kind of query (Phrase Query with SubPhraseScorer) could be combined with DisMax query. For example, something like solr's dismax request handler can be made to use this query where we run a user query as it is against all fields and configure each field with above configurations. I have also attached a patch with comments and test cases in case, my description is not clear enough. Would appreciate alternatives or feedback. The goal is to give more control via configuration when searching using user entered queries against multiple fields where sub phrases have special significance. Example Usage: // sub phrase config PhraseQuery.SubPhraseConfig conf = new PhraseQuery.SubPhraseConfig(); conf.ignoreIdf = true; conf.ignoreFieldNorms = true; conf.matchOnlyLongest = true; conf.ignoreDuplicates = true; conf.phraseBoost = 2; // phrase query as usual PhraseQuery pq = new PhraseQuery(); pq.add(new Term("f", term)); pq.add(new Term("f", term)); pq.setSubPhraseConf(conf); Hits hits = searcher.search(pq); Summary: SubPhraseQuery for matching and scoring sub phrase matches. (was: PhraseQuery Scorer for scoring sub phrase matches) Removed the dependency on PhraseQuery so that this can be reviewed and used independently. Made it a separate query with configurations specific to sub phrase matches, The new patch makes no changes to any of existing files. Please let me know your thoughts. > SubPhraseQuery for matching and scoring sub phrase matches. > --- > > Key: LUCENE-1853 > URL: https://issues.apache.org/jira/browse/LUCENE-1853 > Project: Lucene - Java > Issue Type: Im
[jira] Updated: (LUCENE-1853) SubPhraseQuery for matching and scoring sub phrase matches.
[ https://issues.apache.org/jira/browse/LUCENE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Preetam Rao updated LUCENE-1853: Remaining Estimate: (was: 336h) Original Estimate: (was: 336h) > SubPhraseQuery for matching and scoring sub phrase matches. > --- > > Key: LUCENE-1853 > URL: https://issues.apache.org/jira/browse/LUCENE-1853 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Environment: Lucene/Java >Reporter: Preetam Rao >Priority: Minor > Attachments: LUCENE-1853.patch, LUCENE-1853.patch > > > The goal is to give more control via configuration when searching using user > entered queries against multiple fields where sub phrases have special > significance. > For a query like "homes in new york with swimming pool", if a document's > field matches only "new york" it should get scored and it should get scored > higher than two separate matches "new" and "york". Also, a 3 word sub phrase > match must gets scored considerably higher than a 2 word sub phrase match. > (boost factor should be configurable) > Using shingles for this use case, means each field of each document needs to > be indexed as shingles of all (1..N)-grams as well as the query. (Please > correct me if I am wrong.) > The query could also support > - ignoring of idf and/or field norms, (so that factors outside the document > don't influence scoring) > - consider only the longest match (for example match on "new york" is scored > and considered rather than "new" furniture and "york" city) > - ignore duplicates ("new york" appearing twice or thrice does not make any > difference) > This kind of query could be combined with DisMax query. For example, > something like solr's dismax request handler can be made to use this query > where we run a user query as it is against all fields and configure each > field with above configurations. > I have also attached a patch with comments and test cases in case, my > description is not clear enough. Would appreciate alternatives or feedback. > Example Usage: > >// sub phrase config > SubPhraseQuery.SubPhraseConfig conf = new > SubPhraseQuery.SubPhraseConfig(); > conf.ignoreIdf = true; > conf.ignoreFieldNorms = true; > conf.matchOnlyLongest = true; > conf.ignoreDuplicates = true; > conf.phraseBoost = 2; > // phrase query as usual >SubPhraseQuery pq = new SubPhraseQuery(); >pq.add(new Term("f", term)); >pq.add(new Term("f", term)); > pq.setSubPhraseConf(conf); > Hits hits = searcher.search(pq); > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org