[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846261#action_12846261 ] Jason Rutherglen commented on LUCENE-2312: -- I think the easiest way to test out the concurrency is to add a flush method to ByteBlockPool. Then allocate a read only version of the buffers array (not copying the byte arrays, just the 1st dimension pointers). The only issue is to rework the code to read from the read only array, and write to the write only array... > Search on IndexWriter's RAM Buffer > -- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.1 >Reporter: Jason Rutherglen >Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846220#action_12846220 ] Jason Rutherglen commented on LUCENE-2324: -- Michael, Agreed, can you outline how you think we should proceed then? > Per thread DocumentsWriters that write their own private segments > - > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals
[ https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846189#action_12846189 ] Uwe Schindler edited comment on LUCENE-2326 at 3/16/10 11:17 PM: - Here the patch, before applying do the following (in main checkout folder): {noformat} ant clean-backwards svn mkdir ./backwards svn cp https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src backwards/src svn propset svn:externals "data -r500 svn://svn.tartarus.org/snowball/trunk/data" contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball svn propdel svn:ignore contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball {noformat} Then apply patch and run svn up. was (Author: thetaphi): Here the patch, before applying do the following (in main checkout folder): {noformat} ant clean-backwards svn mkdir ./backwards svn cp https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src backwards/src svn propset svn:externals "-r500 svn://svn.tartarus.org/snowball/trunk/data data" contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball svn propdel svn:ignore contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball {noformat} Then apply patch and run svn up. > Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards > branch and linking snowball tests by svn:externals > --- > > Key: LUCENE-2326 > URL: https://issues.apache.org/jira/browse/LUCENE-2326 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: Flex Branch, 3.1 > > Attachments: LUCENE-2326.patch > > > As we often need to update backwards tests together with trunk and always > have to update the branch first, record rev no, and update build xml, I would > simply like to do a svn copy/move of the backwards branch. > After a release, this is simply also done: > {code} > svn rm backwards > svn cp releasebranch backwards > {code} > By this we can simply commit in one pass, create patches in one pass. > The snowball tests are currently downloaded by svn.exe, too. These need a > fixed version for checkout. I would like to change this to use svn:externals. > Will provide patch, soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals
[ https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846198#action_12846198 ] Uwe Schindler commented on LUCENE-2326: --- I added one thing (as discussed with rmuir): As the snowball test data is too much, i excluded it from the src jar. The test will not fail, but instead print a warning, that the data is missing. So the test will also pass, if e.g. hudson fails to checkout the external svn repo. > Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards > branch and linking snowball tests by svn:externals > --- > > Key: LUCENE-2326 > URL: https://issues.apache.org/jira/browse/LUCENE-2326 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: Flex Branch, 3.1 > > Attachments: LUCENE-2326.patch > > > As we often need to update backwards tests together with trunk and always > have to update the branch first, record rev no, and update build xml, I would > simply like to do a svn copy/move of the backwards branch. > After a release, this is simply also done: > {code} > svn rm backwards > svn cp releasebranch backwards > {code} > By this we can simply commit in one pass, create patches in one pass. > The snowball tests are currently downloaded by svn.exe, too. These need a > fixed version for checkout. I would like to change this to use svn:externals. > Will provide patch, soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals
[ https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846189#action_12846189 ] Uwe Schindler edited comment on LUCENE-2326 at 3/16/10 11:01 PM: - Here the patch, before applying do the following (in main checkout folder): {noformat} ant clean-backwards svn mkdir ./backwards svn cp https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src backwards/src svn propset svn:externals "-r500 svn://svn.tartarus.org/snowball/trunk/data data" contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball svn propdel svn:ignore contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball {noformat} Then apply patch and run svn up. was (Author: thetaphi): Here the patch, before applying do the following (in main checkout folder): {noformat} ant clean-backwards svn mkdir ./backwards svn cp https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src . svn propset svn:externals "-r500 svn://svn.tartarus.org/snowball/trunk/data data" contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball svn propdel svn:ignore contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball {noformat} Then apply patch and run svn up. > Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards > branch and linking snowball tests by svn:externals > --- > > Key: LUCENE-2326 > URL: https://issues.apache.org/jira/browse/LUCENE-2326 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: Flex Branch, 3.1 > > Attachments: LUCENE-2326.patch > > > As we often need to update backwards tests together with trunk and always > have to update the branch first, record rev no, and update build xml, I would > simply like to do a svn copy/move of the backwards branch. > After a release, this is simply also done: > {code} > svn rm backwards > svn cp releasebranch backwards > {code} > By this we can simply commit in one pass, create patches in one pass. > The snowball tests are currently downloaded by svn.exe, too. These need a > fixed version for checkout. I would like to change this to use svn:externals. > Will provide patch, soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
Duh -- I meant to reply to Hoss' proposal, below: On Tue, Mar 16, 2010 at 5:55 PM, Michael McCandless wrote: > +1 > > I like this proposal! > > I agree we should not preclude the future (modules), let's just not > hold up dev today until we solve it. > > I agree your side by side solution would allow for us to later factor > up modules (eg analyzers). > > Mike > > On Tue, Mar 16, 2010 at 5:47 PM, Michael McCandless > wrote: >> But it's actually the reverse? Solr depends on Lucene but not vice/versa. >> >> (If instead I proposed making Solr a subdir of Lucene then I'd agree) >> >> So... if you checkout only lucene, you can cd there and do all you do >> today with Lucene ("ant test", "ant dist", "svn diff", etc.). >> >> If you checkout solr, you can cd there and "ant test" will run all of >> Lucene's and all of Solr's tests. "svn diff" will include any changes >> to lucene and to solr. >> >> Ie this achieves want we want -- Solr to depend on Lucene but not vice >> versa, right? >> >> Mike >> >> On Tue, Mar 16, 2010 at 5:18 PM, Shai Erera wrote: >>> I have to agree w/ Jake that putting Lucene under Solr gives the impression >>> as if suddenly Lucene became dependent on it ... and for really no good >>> reasons. Are we making that decision to simplify the build of Solr? What are >>> the problems Solr faces today w.r.t. its build and using a Lucene release or >>> trunk revision? >>> >>> I didn't follow the Lucene/Solr merge on general@, because I didn't even >>> know such a beast exists. So I guess I'm missing something ... >>> >>> Shai >>> >>> On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix wrote: On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley wrote: > > > Chiming in just a bit here - isn't there any concern that independent > > of > > whether or not people "can" > > build lucene without checking out solr, the mere fact that Lucene will > > be > > effectively a "subdirectory" > > of solr... is there no concern that there will then be a perception > > that Lucene is a subproject of > > Solr, instead of vice-versa? > > Who would have this perception? > Casual users will be using downloads. Developers and dev managers at companies doing build vs. buy decisions regarding whether they will do one of the following: 1) pay big bucks to get FAST or whatever 2) use Solr (free/cheap!) 3) pay [variable] bucks to build their own with Lucene 4) pay [variable but high] to build their own from scratch I'm not concerned with casual downloaders. I'm talking about the companies and people who may or may not be interested in making multi-million dollar decisions regarding using or not using Lucene or Solr. -jake >>> >> > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
What about tagging and branching? When we cut a Lucene release we also tag Solr, even though it's not being released? Michael On 3/16/10 3:47 PM, Michael McCandless wrote: But it's actually the reverse? Solr depends on Lucene but not vice/versa. (If instead I proposed making Solr a subdir of Lucene then I'd agree) So... if you checkout only lucene, you can cd there and do all you do today with Lucene ("ant test", "ant dist", "svn diff", etc.). If you checkout solr, you can cd there and "ant test" will run all of Lucene's and all of Solr's tests. "svn diff" will include any changes to lucene and to solr. Ie this achieves want we want -- Solr to depend on Lucene but not vice versa, right? Mike On Tue, Mar 16, 2010 at 5:18 PM, Shai Erera wrote: I have to agree w/ Jake that putting Lucene under Solr gives the impression as if suddenly Lucene became dependent on it ... and for really no good reasons. Are we making that decision to simplify the build of Solr? What are the problems Solr faces today w.r.t. its build and using a Lucene release or trunk revision? I didn't follow the Lucene/Solr merge on general@, because I didn't even know such a beast exists. So I guess I'm missing something ... Shai On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix wrote: On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley wrote: Chiming in just a bit here - isn't there any concern that independent of whether or not people "can" build lucene without checking out solr, the mere fact that Lucene will be effectively a "subdirectory" of solr... is there no concern that there will then be a perception that Lucene is a subproject of Solr, instead of vice-versa? Who would have this perception? Casual users will be using downloads. Developers and dev managers at companies doing build vs. buy decisions regarding whether they will do one of the following: 1) pay big bucks to get FAST or whatever 2) use Solr (free/cheap!) 3) pay [variable] bucks to build their own with Lucene 4) pay [variable but high] to build their own from scratch I'm not concerned with casual downloaders. I'm talking about the companies and people who may or may not be interested in making multi-million dollar decisions regarding using or not using Lucene or Solr. -jake - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
+1 I like this proposal! I agree we should not preclude the future (modules), let's just not hold up dev today until we solve it. I agree your side by side solution would allow for us to later factor up modules (eg analyzers). Mike On Tue, Mar 16, 2010 at 5:47 PM, Michael McCandless wrote: > But it's actually the reverse? Solr depends on Lucene but not vice/versa. > > (If instead I proposed making Solr a subdir of Lucene then I'd agree) > > So... if you checkout only lucene, you can cd there and do all you do > today with Lucene ("ant test", "ant dist", "svn diff", etc.). > > If you checkout solr, you can cd there and "ant test" will run all of > Lucene's and all of Solr's tests. "svn diff" will include any changes > to lucene and to solr. > > Ie this achieves want we want -- Solr to depend on Lucene but not vice > versa, right? > > Mike > > On Tue, Mar 16, 2010 at 5:18 PM, Shai Erera wrote: >> I have to agree w/ Jake that putting Lucene under Solr gives the impression >> as if suddenly Lucene became dependent on it ... and for really no good >> reasons. Are we making that decision to simplify the build of Solr? What are >> the problems Solr faces today w.r.t. its build and using a Lucene release or >> trunk revision? >> >> I didn't follow the Lucene/Solr merge on general@, because I didn't even >> know such a beast exists. So I guess I'm missing something ... >> >> Shai >> >> On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix wrote: >>> >>> On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley wrote: > Chiming in just a bit here - isn't there any concern that independent > of > whether or not people "can" > build lucene without checking out solr, the mere fact that Lucene will > be > effectively a "subdirectory" > of solr... is there no concern that there will then be a perception > that Lucene is a subproject of > Solr, instead of vice-versa? Who would have this perception? Casual users will be using downloads. >>> >>> Developers and dev managers at companies doing build vs. buy decisions >>> regarding >>> whether they will do one of the following: >>> 1) pay big bucks to get FAST or whatever >>> 2) use Solr (free/cheap!) >>> 3) pay [variable] bucks to build their own with Lucene >>> 4) pay [variable but high] to build their own from scratch >>> I'm not concerned with casual downloaders. I'm talking about the >>> companies and people who >>> may or may not be interested in making multi-million dollar decisions >>> regarding using or >>> not using Lucene or Solr. >>> -jake >> > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals
[ https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2326: -- Attachment: LUCENE-2326.patch Here the patch, before applying do the following (in main checkout folder): {noformat} ant clean-backwards svn mkdir ./backwards svn cp https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src . svn propset svn:externals "-r500 svn://svn.tartarus.org/snowball/trunk/data data" contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball svn propdel svn:ignore contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball {noformat} Then apply patch and run svn up. > Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards > branch and linking snowball tests by svn:externals > --- > > Key: LUCENE-2326 > URL: https://issues.apache.org/jira/browse/LUCENE-2326 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: Flex Branch, 3.1 > > Attachments: LUCENE-2326.patch > > > As we often need to update backwards tests together with trunk and always > have to update the branch first, record rev no, and update build xml, I would > simply like to do a svn copy/move of the backwards branch. > After a release, this is simply also done: > {code} > svn rm backwards > svn cp releasebranch backwards > {code} > By this we can simply commit in one pass, create patches in one pass. > The snowball tests are currently downloaded by svn.exe, too. These need a > fixed version for checkout. I would like to change this to use svn:externals. > Will provide patch, soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
But it's actually the reverse? Solr depends on Lucene but not vice/versa. (If instead I proposed making Solr a subdir of Lucene then I'd agree) So... if you checkout only lucene, you can cd there and do all you do today with Lucene ("ant test", "ant dist", "svn diff", etc.). If you checkout solr, you can cd there and "ant test" will run all of Lucene's and all of Solr's tests. "svn diff" will include any changes to lucene and to solr. Ie this achieves want we want -- Solr to depend on Lucene but not vice versa, right? Mike On Tue, Mar 16, 2010 at 5:18 PM, Shai Erera wrote: > I have to agree w/ Jake that putting Lucene under Solr gives the impression > as if suddenly Lucene became dependent on it ... and for really no good > reasons. Are we making that decision to simplify the build of Solr? What are > the problems Solr faces today w.r.t. its build and using a Lucene release or > trunk revision? > > I didn't follow the Lucene/Solr merge on general@, because I didn't even > know such a beast exists. So I guess I'm missing something ... > > Shai > > On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix wrote: >> >> On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley wrote: >>> >>> > Chiming in just a bit here - isn't there any concern that independent >>> > of >>> > whether or not people "can" >>> > build lucene without checking out solr, the mere fact that Lucene will >>> > be >>> > effectively a "subdirectory" >>> > of solr... is there no concern that there will then be a perception >>> > that Lucene is a subproject of >>> > Solr, instead of vice-versa? >>> >>> Who would have this perception? >>> Casual users will be using downloads. >> >> Developers and dev managers at companies doing build vs. buy decisions >> regarding >> whether they will do one of the following: >> 1) pay big bucks to get FAST or whatever >> 2) use Solr (free/cheap!) >> 3) pay [variable] bucks to build their own with Lucene >> 4) pay [variable but high] to build their own from scratch >> I'm not concerned with casual downloaders. I'm talking about the >> companies and people who >> may or may not be interested in making multi-million dollar decisions >> regarding using or >> not using Lucene or Solr. >> -jake > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
Dev is now merged with Solr and Lucene -- that has already passed. If that will scare customers away, that's a risk we take -- the benefits of merged dev outweigh that, in my opinion. The incremental risk that the details of our svn URLs will scare people away seems negligible. And we can always change this up, later, if we decide to. I think what's important now is a we pick something to un-block trunk dev. Sure people can keep working on the branch but I think it'd be better if we get this simple "svn move" done so that we can get normal dev going on a shared trunk again. Mike On Tue, Mar 16, 2010 at 5:28 PM, Jake Mannix wrote: > > On Tue, Mar 16, 2010 at 3:10 PM, Yonik Seeley wrote: >> >> On Tue, Mar 16, 2010 at 6:01 PM, Jake Mannix >> wrote: >> > I'm not concerned with casual downloaders. I'm talking >> >> > about the companies and people who may or may not be >> >> > interested in making multi-million dollar decisions regarding >> >> > using or not using Lucene or Solr. >> >> Heh - multi-million dollar decisions after a quick glance at an SVN url? > > Clearly not. But just as I think that making the development of > both solr and lucene easier is a noble goal, I think that giving > people the impression that by choosing to "go with Lucene" > *means* they "go with Solr" as their end solution is not what > we want to do. There are some places where Solr is just not > appropriate but Lucene may be. > Will this impression be "caused" by a SVN directory url > alone? Of course not. Merging committer lists, locked > releases, *and* a SVN url which shows this? Yes, I > think the kinds of VPs and CTO's I've talked to and > tried to help decide whether to go with an open-source > search solution could indeed start to get the feeling that > there's really just one apache solution, the > "Solr/Lucene solution". And if they look into Solr and > decide that this particular application is not for them, > they may then not look deep enough to see whether > doing a custom Lucene application *would be*. > -jake - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 3:10 PM, Yonik Seeley wrote: > On Tue, Mar 16, 2010 at 6:01 PM, Jake Mannix > wrote: > > I'm not concerned with casual downloaders. I'm talking > > about the companies and people who may or may not be > > interested in making multi-million dollar decisions regarding > > using or not using Lucene or Solr. > > Heh - multi-million dollar decisions after a quick glance at an SVN url? > Clearly not. But just as I think that making the development of both solr and lucene easier is a noble goal, I think that giving people the impression that by choosing to "go with Lucene" *means* they "go with Solr" as their end solution is not what we want to do. There are some places where Solr is just not appropriate but Lucene may be. Will this impression be "caused" by a SVN directory url alone? Of course not. Merging committer lists, locked releases, *and* a SVN url which shows this? Yes, I think the kinds of VPs and CTO's I've talked to and tried to help decide whether to go with an open-source search solution could indeed start to get the feeling that there's really just one apache solution, the "Solr/Lucene solution". And if they look into Solr and decide that this particular application is not for them, they may then not look deep enough to see whether doing a custom Lucene application *would be*. -jake
Re: lucene and solr trunk
I have to agree w/ Jake that putting Lucene under Solr gives the impression as if suddenly Lucene became dependent on it ... and for really no good reasons. Are we making that decision to simplify the build of Solr? What are the problems Solr faces today w.r.t. its build and using a Lucene release or trunk revision? I didn't follow the Lucene/Solr merge on general@, because I didn't even know such a beast exists. So I guess I'm missing something ... Shai On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix wrote: > On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley wrote: >> >> > Chiming in just a bit here - isn't there any concern that independent >> of >> > whether or not people "can" >> > build lucene without checking out solr, the mere fact that Lucene will >> be >> > effectively a "subdirectory" >> > of solr... is there no concern that there will then be a perception >> that Lucene is a subproject of >> > Solr, instead of vice-versa? >> >> Who would have this perception? >> Casual users will be using downloads. >> > > Developers and dev managers at companies doing build vs. buy decisions > regarding > whether they will do one of the following: > > 1) pay big bucks to get FAST or whatever > 2) use Solr (free/cheap!) > 3) pay [variable] bucks to build their own with Lucene > 4) pay [variable but high] to build their own from scratch > > I'm not concerned with casual downloaders. I'm talking about the companies > and people who > may or may not be interested in making multi-million dollar decisions > regarding using or > not using Lucene or Solr. > > -jake >
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 6:01 PM, Jake Mannix wrote: > I'm not concerned with casual downloaders. I'm talking about the companies > and people who > may or may not be interested in making multi-million dollar decisions > regarding using or > not using Lucene or Solr. Heh - multi-million dollar decisions after a quick glance at an SVN url? -Yonik - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley wrote: > > > Chiming in just a bit here - isn't there any concern that independent of > > whether or not people "can" > > build lucene without checking out solr, the mere fact that Lucene will be > > effectively a "subdirectory" > > of solr... is there no concern that there will then be a perception that > Lucene is a subproject of > > Solr, instead of vice-versa? > > Who would have this perception? > Casual users will be using downloads. > Developers and dev managers at companies doing build vs. buy decisions regarding whether they will do one of the following: 1) pay big bucks to get FAST or whatever 2) use Solr (free/cheap!) 3) pay [variable] bucks to build their own with Lucene 4) pay [variable but high] to build their own from scratch I'm not concerned with casual downloaders. I'm talking about the companies and people who may or may not be interested in making multi-million dollar decisions regarding using or not using Lucene or Solr. -jake
Re: lucene and solr trunk
Where would the modules live? I'm not sure if I sent it on this thread or somewhere else, but what about my proposal to have all three sitting under their own directories, w/ their own trunk/branch/tags, and if it's easier for dev then put all three under one root (for permission management maybe)? Shai On Tue, Mar 16, 2010 at 11:53 PM, Yonik Seeley wrote: > On Tue, Mar 16, 2010 at 5:42 PM, Jake Mannix > wrote: > > On Tue, Mar 16, 2010 at 2:31 PM, Michael McCandless > > wrote: > >> > >> If we move lucene under Solr's existing svn path, ie: > >> > >> /solr/trunk/lucene > > > > Chiming in just a bit here - isn't there any concern that independent of > > whether or not people "can" > > build lucene without checking out solr, the mere fact that Lucene will be > > effectively a "subdirectory" > > of solr... is there no concern that there will then be a perception that > Lucene is a subproject of > > Solr, instead of vice-versa? > > Who would have this perception? > Casual users will be using downloads. > > Likewise, should solr be concerned that it's currently under a lucene > URL? How many casual users actually understand the difference between > the lucene TLP and the lucene java subproject? > > This is really about what makes most sense for development. > > -Yonik > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 5:42 PM, Jake Mannix wrote: > On Tue, Mar 16, 2010 at 2:31 PM, Michael McCandless > wrote: >> >> If we move lucene under Solr's existing svn path, ie: >> >> /solr/trunk/lucene > > Chiming in just a bit here - isn't there any concern that independent of > whether or not people "can" > build lucene without checking out solr, the mere fact that Lucene will be > effectively a "subdirectory" > of solr... is there no concern that there will then be a perception that > Lucene is a subproject of > Solr, instead of vice-versa? Who would have this perception? Casual users will be using downloads. Likewise, should solr be concerned that it's currently under a lucene URL? How many casual users actually understand the difference between the lucene TLP and the lucene java subproject? This is really about what makes most sense for development. -Yonik - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 2:31 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > If we move lucene under Solr's existing svn path, ie: > > /solr/trunk/lucene Chiming in just a bit here - isn't there any concern that independent of whether or not people "can" build lucene without checking out solr, the mere fact that Lucene will be effectively a "subdirectory" of solr... is there no concern that there will then be a perception that Lucene is a subproject of Solr, instead of vice-versa? The way mavenified projects work is that there would instead be a top level in which both solr and lucene would be submodules (and thus also subdirectories in svn), with a dependency from solr to lucene (in the pom.xml for maven, but easy enough to do with the build.xml with ant). Checking out solr without lucene should be doable (using snapshot jars from lucene trunk nightly, maybe?), and the reverse should be easy, as could be checking out the top-level and getting everything (including a top-level build.xml which 's or antcall's into the subdirectory build.xmls). It seems really weird to have Lucene appear as a subdirectory of Solr, especially for people out there who aren't using Solr. -jake
Re: lucene and solr trunk
The primary concern seems to be ensuring that, once we merge svn, one can still checkout & build & run tests/etc for Lucene alone. If we move lucene under Solr's existing svn path, ie: /solr/trunk/lucene and then fixup solr's build files to go and compile sources from the lucene dir, run tests there, etc., then, one can still checkout & run lucene fully independently -- this addresses that concern? So how about we start with this approach? Progress not perfection... If somehow this layout is a problem then we can just move things around, again. Alot of great progress has already been made on the temporary branch -- Solr runs fine on Lucene trunk! And, also on flex. We need to settle an initial svn structure so the changes on the branch can be fully reviewed & then committed to trunk and normal dev can proceed... We don't need to solve how modules/contribs, etc., are going to be fixed, now -- that all can come later. IRC issues, using GIT instead, etc. should also be discussed separately. Let's just pick a place in svn and free up ongoing dev... Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846128#action_12846128 ] Michael Busch commented on LUCENE-2324: --- I think we all agree that we want to have a single writer thread, multi reader thread model. Only then the thread-safety problems in LUCENE-2312 can be reduced to visibility (no write-locking). So I think making this change first makes most sense. It involves a bit boring refactoring work unfortunately. > Per thread DocumentsWriters that write their own private segments > - > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
IRC has been discussed to death at Apache: http://markmail.org/search/?q=IRC+list%3Aorg.apache.incubator.general Look for the spikes... like this: http://markmail.org/search/?q=IRC+list%3Aorg.apache.incubator.general#query:IRC%20list%3Aorg.apache.incubator.general%20date%3A200608%20+page:1+state:facets -Yonik - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846112#action_12846112 ] Jason Rutherglen commented on LUCENE-2324: -- Actually TermsHashField doesn't need to be concurrent, it's only being written to and the terms concurrent skiplist (was a btree) holds the reference to the posting list. So I think we're good there because terms enum never accesses the terms hash. Nice! > Per thread DocumentsWriters that write their own private segments > - > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846110#action_12846110 ] Jason Rutherglen commented on LUCENE-2324: -- NormsWriterPerField has a growing norm byte array, we'd need a way to read/write lock it... I think we have concurrency issues in the TermsHash table? Maybe it'd need to be rewritten to use ConcurrentHashMap? > Per thread DocumentsWriters that write their own private segments > - > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals
[ https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846108#action_12846108 ] Michael McCandless commented on LUCENE-2326: +1 This sounds sooo much better than what we do now. > Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards > branch and linking snowball tests by svn:externals > --- > > Key: LUCENE-2326 > URL: https://issues.apache.org/jira/browse/LUCENE-2326 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: Flex Branch, 3.1 > > > As we often need to update backwards tests together with trunk and always > have to update the branch first, record rev no, and update build xml, I would > simply like to do a svn copy/move of the backwards branch. > After a release, this is simply also done: > {code} > svn rm backwards > svn cp releasebranch backwards > {code} > By this we can simply commit in one pass, create patches in one pass. > The snowball tests are currently downloaded by svn.exe, too. These need a > fixed version for checkout. I would like to change this to use svn:externals. > Will provide patch, soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
On Tue, Mar 16, 2010 at 2:17 PM, Michael Busch wrote: > But at the same time can we make sure that the decisions that are made on > IRC are still being described in a jira issue? +1 Any time something is discussed on IRC, it must be summarized on the lists or in an issue, with the details based on what was discussed, or else it didn't happen IRC is a great way to hash out ideas, brainstorm, shoot the breeze, vent, etc. Much of what's discussed doesn't pan out... but when stuff does we always bring to the lists... Those of us spending some time on IRC have been trying to do exactly that. Maybe we've been falling short sometimes, not providing enough detail, so we should fix that with time. We're all still learning as we go... Also: if an issue is opened and it's missing details, regardless of whether it was born in IRC or some other place, people should simply ask questions, punch holes, etc. When another set of eyes, or the same set of eyes some time later, look at the issue, very different and healthy iterations happen. Most certainly if something seems like a good idea during IRC discussions that doesn't not mean the debate is done -- rather the issue is opened and lots of other people chime in. Nothing is "decided" on IRC... only ideas are born... that's all. Stepping back, Lucene/Solr are clearly at a fast pace of innovation right now, and this is really very healthy. It'd already been fast a few months ago, but it seems to be accelerating... I think that's because suddenly we have quite a few strong [near-] full-time devs here, and, because IRC allows for real-time conversations for brainstorming. This is net/net good for both Lucene and Solr and I think we should try to find a way to make IRC work well so devs that do happen to have the time (and, the list will change with time -- bright stars never shine for long) can brainstorm and bring new ideas to the community... Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846102#action_12846102 ] Jason Rutherglen commented on LUCENE-2324: -- Michael, For LUCENE-2312, I think the searching isn't going to be an issue, I've got basic per thread doc writers working (though not thoroughly tested). I didn't see a great need to rework all the classes, which even if we did, I'm not sure helps with the byte array read write issues? I'd prefer to get a proof of concept more or less working, then refine it from there. I think there's two main design/implementation issues before we can roll something out: 1) A new skip list implementation that at specific intervals writes a new skip (ie, single level). Right now in trunk we have a multilevel skiplist that requires ahead of time the number of docs. 2) Figure out the low -> high levels of byte/char/int array visibility to reader threads. The main challenge here is the fact that the DW related code that utilizes this is really hard for me to understand enough to know what can be changed, without the side effect being bunches of other broken stuff. If there was a Directory like class abstraction we could simply override and reimplement, we could do that, and maybe there is one, I'm not sure yet. However if reworking the PerThread classes somehow makes the tie into the IO (eg, the byte array pooling) system abstracted and easier, then I'm all for it. > Per thread DocumentsWriters that write their own private segments > - > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
: with, "if id didn't happen on the lists, it didn't happen". Its the same as +1 But as the IRC channel gets used more and more, it would *also* be nice if there was an archive of the IRC channel so that there is a place to go look to understand the back story behind an idea once it's synthesized and posted to the lists/jira. That's the huge advantage IRC has over informal conversations at hackathons, apachecon, and meetups -- there can in fact be easily archivable/parsable/searchable records of the communication. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
On Mar 16, 2010, at 3:24 PM, Mark Miller wrote: > On 03/16/2010 02:57 PM, Grant Ingersoll wrote: >> On Mar 16, 2010, at 2:47 PM, Steven A Rowe wrote: >> >> >>> On 03/16/2010 at 6:06 AM, Michael McCandless wrote: >>> Does anyone know how other projects fold in IRC...? >>> I gather from the deafening silence that we'll have to figure it out as we >>> go... >>> >>> I think some (not all) of the discomfort associated with IRC could be >>> addressed with a permanent, searchable, linkable archive of #lucene. >>> >>> I went looking for IRC loggers and found http://colabti.org/. One of the >>> things hosted there is a searchable, linkable permanent archive of several >>> freenode channels. I posted on #irclogger asking about hosting #lucene >>> archive, and apparently all we have to do is ask, after first determining >>> that nobody objects. Here's a link (not incidentally, this is exactly what >>> we will have for #lucene once the service is switched on): >>> >>> http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2 >>> >>> So, would anybody participating on #lucene object to a permanent archive? >>> >>> (I'm also going to provide a link to this thread on #lucene to make sure >>> everybody there knows about the issue.) >>> >> There's also a lot of chatter that happens on IRC, so logging is going to >> have a lot of noise. I'm still on the fence on what to do. I don't want to >> get in people's way, but we also need to have traceability about decisions, >> and we certainly can't have answers like "We discussed this on IRC and you >> missed it, too bad" happening (not saying that has happening, just saying I >> don't want to see it). >> >> -Grant >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > > Even with logging, I'm against using IRC for making decisions, or as > something people can point to. Even with searchable logging, I think we > should stick with, "if id didn't happen on the lists, it didn't happen". Its > the same as when some of us get together and talk about Lucene and Solr - > thats great stuff - you can get a lot done that is a lot harder on the lists > - you can hash a lot out. But I think people should always have the right to > act like it didn't happen - the same as if we are at ApacheCon or something - > we don't come back and say, sorry, you missed all the discussion, but we had > one and this what we are going to do. We summarize the discussion on the list > (like Mike likes to do with IRC), and answer questions as people have them. I > personally think its great to come to mini agreements with real-time talk - > then it just has to make its way through the list. > > This isn't a counter point to anything you said Grant, just a nice place for > me to drop this. > +1. The ApacheCon talks are a great example of bringing back off list stuff to the list. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846084#action_12846084 ] Michael Busch commented on LUCENE-2324: --- Shall we not first try to remove the downstream *PerThread classes and make the DocumentsWriter single-threaded without locking. Then we add a PerThreadDocumentsWriter and DocumentsWriterThreadBinder, which talks to the PerThreadDWs and IW talks to DWTB. We can pick other names :) When that's done we can think about what kind of locking/synchronization/volatile stuff we need for LUCENE-2312. > Per thread DocumentsWriters that write their own private segments > - > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
On 03/16/2010 02:57 PM, Grant Ingersoll wrote: On Mar 16, 2010, at 2:47 PM, Steven A Rowe wrote: On 03/16/2010 at 6:06 AM, Michael McCandless wrote: Does anyone know how other projects fold in IRC...? I gather from the deafening silence that we'll have to figure it out as we go... I think some (not all) of the discomfort associated with IRC could be addressed with a permanent, searchable, linkable archive of #lucene. I went looking for IRC loggers and found http://colabti.org/. One of the things hosted there is a searchable, linkable permanent archive of several freenode channels. I posted on #irclogger asking about hosting #lucene archive, and apparently all we have to do is ask, after first determining that nobody objects. Here's a link (not incidentally, this is exactly what we will have for #lucene once the service is switched on): http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2 So, would anybody participating on #lucene object to a permanent archive? (I'm also going to provide a link to this thread on #lucene to make sure everybody there knows about the issue.) There's also a lot of chatter that happens on IRC, so logging is going to have a lot of noise. I'm still on the fence on what to do. I don't want to get in people's way, but we also need to have traceability about decisions, and we certainly can't have answers like "We discussed this on IRC and you missed it, too bad" happening (not saying that has happening, just saying I don't want to see it). -Grant - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org Even with logging, I'm against using IRC for making decisions, or as something people can point to. Even with searchable logging, I think we should stick with, "if id didn't happen on the lists, it didn't happen". Its the same as when some of us get together and talk about Lucene and Solr - thats great stuff - you can get a lot done that is a lot harder on the lists - you can hash a lot out. But I think people should always have the right to act like it didn't happen - the same as if we are at ApacheCon or something - we don't come back and say, sorry, you missed all the discussion, but we had one and this what we are going to do. We summarize the discussion on the list (like Mike likes to do with IRC), and answer questions as people have them. I personally think its great to come to mini agreements with real-time talk - then it just has to make its way through the list. This isn't a counter point to anything you said Grant, just a nice place for me to drop this. -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
It be very cool to have a searchable archive for the IRC discussions, so +1. But at the same time can we make sure that the decisions that are made on IRC are still being described in a jira issue? I don't mean that people should repeat brainstorming, but if a discussion leads to opening a Jira issue it'd be good to understand the reasons and details without having to search the IRC log. Only if someone wants to know more, e.g. what lead to the discussion, what other ideas were discarded, etc. should have to go to the IRC log. Michael On 3/16/10 11:58 AM, Michael McCandless wrote: +1, this looks great! Mike On Tue, Mar 16, 2010 at 1:52 PM, Andi Vajda wrote: On Mar 16, 2010, at 11:47, Steven A Rowe wrote: On 03/16/2010 at 6:06 AM, Michael McCandless wrote: Does anyone know how other projects fold in IRC...? I gather from the deafening silence that we'll have to figure it out as we go... I think some (not all) of the discomfort associated with IRC could be addressed with a permanent, searchable, linkable archive of #lucene. I went looking for IRC loggers and found http://colabti.org/. One of the things hosted there is a searchable, linkable permanent archive of several freenode channels. I posted on #irclogger asking about hosting #lucene archive, and apparently all we have to do is ask, after first determining that nobody objects. Here's a link (not incidentally, this is exactly what we will have for #lucene once the service is switched on): http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2 So, would anybody participating on #lucene object to a permanent archive? No objections on my part. I think this is essential. Andi.. (I'm also going to provide a link to this thread on #lucene to make sure everybody there knows about the issue.) Steve - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
+1, this looks great! Mike On Tue, Mar 16, 2010 at 1:52 PM, Andi Vajda wrote: > > On Mar 16, 2010, at 11:47, Steven A Rowe wrote: > >> On 03/16/2010 at 6:06 AM, Michael McCandless wrote: >>> >>> Does anyone know how other projects fold in IRC...? >> >> I gather from the deafening silence that we'll have to figure it out as we >> go... >> >> I think some (not all) of the discomfort associated with IRC could be >> addressed with a permanent, searchable, linkable archive of #lucene. >> >> I went looking for IRC loggers and found http://colabti.org/. One of the >> things hosted there is a searchable, linkable permanent archive of several >> freenode channels. I posted on #irclogger asking about hosting #lucene >> archive, and apparently all we have to do is ask, after first determining >> that nobody objects. Here's a link (not incidentally, this is exactly what >> we will have for #lucene once the service is switched on): >> >> http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2 >> >> So, would anybody participating on #lucene object to a permanent archive? > > No objections on my part. I think this is essential. > > Andi.. > >> >> (I'm also going to provide a link to this thread on #lucene to make sure >> everybody there knows about the issue.) >> >> Steve >> >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
On Mar 16, 2010, at 2:47 PM, Steven A Rowe wrote: > On 03/16/2010 at 6:06 AM, Michael McCandless wrote: >> Does anyone know how other projects fold in IRC...? > > I gather from the deafening silence that we'll have to figure it out as we > go... > > I think some (not all) of the discomfort associated with IRC could be > addressed with a permanent, searchable, linkable archive of #lucene. > > I went looking for IRC loggers and found http://colabti.org/. One of the > things hosted there is a searchable, linkable permanent archive of several > freenode channels. I posted on #irclogger asking about hosting #lucene > archive, and apparently all we have to do is ask, after first determining > that nobody objects. Here's a link (not incidentally, this is exactly what > we will have for #lucene once the service is switched on): > > http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2 > > So, would anybody participating on #lucene object to a permanent archive? > > (I'm also going to provide a link to this thread on #lucene to make sure > everybody there knows about the issue.) There's also a lot of chatter that happens on IRC, so logging is going to have a lot of noise. I'm still on the fence on what to do. I don't want to get in people's way, but we also need to have traceability about decisions, and we certainly can't have answers like "We discussed this on IRC and you missed it, too bad" happening (not saying that has happening, just saying I don't want to see it). -Grant - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: #lucene IRC log [was: RE: lucene and solr trunk]
On Mar 16, 2010, at 11:47, Steven A Rowe wrote: On 03/16/2010 at 6:06 AM, Michael McCandless wrote: Does anyone know how other projects fold in IRC...? I gather from the deafening silence that we'll have to figure it out as we go... I think some (not all) of the discomfort associated with IRC could be addressed with a permanent, searchable, linkable archive of #lucene. I went looking for IRC loggers and found http://colabti.org/. One of the things hosted there is a searchable, linkable permanent archive of several freenode channels. I posted on #irclogger asking about hosting #lucene archive, and apparently all we have to do is ask, after first determining that nobody objects. Here's a link (not incidentally, this is exactly what we will have for #lucene once the service is switched on): http://colabti.org/irclogger/irclogger_log/irclogger? date=2010-03-16#l2 So, would anybody participating on #lucene object to a permanent archive? No objections on my part. I think this is essential. Andi.. (I'm also going to provide a link to this thread on #lucene to make sure everybody there knows about the issue.) Steve - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
#lucene IRC log [was: RE: lucene and solr trunk]
On 03/16/2010 at 6:06 AM, Michael McCandless wrote: > Does anyone know how other projects fold in IRC...? I gather from the deafening silence that we'll have to figure it out as we go... I think some (not all) of the discomfort associated with IRC could be addressed with a permanent, searchable, linkable archive of #lucene. I went looking for IRC loggers and found http://colabti.org/. One of the things hosted there is a searchable, linkable permanent archive of several freenode channels. I posted on #irclogger asking about hosting #lucene archive, and apparently all we have to do is ask, after first determining that nobody objects. Here's a link (not incidentally, this is exactly what we will have for #lucene once the service is switched on): http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2 So, would anybody participating on #lucene object to a permanent archive? (I'm also going to provide a link to this thread on #lucene to make sure everybody there knows about the issue.) Steve - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846037#action_12846037 ] Jason Rutherglen commented on LUCENE-2324: -- Are there going to be issues with the char array buffers as well (ie, will we need to also flush them for concurrency?) > Per thread DocumentsWriters that write their own private segments > - > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
Hi My only concern w/ how SVN might end up organized is that I'll still be able checkout core lucene independently of Solr (and possibly contrib/modules) and then build and test it. Also a separate project in Eclipse is important as well. How about this structure: /solr/trunk /lucene/trunk /modules/trunk can be left out if we don't think it's necessary. This should allow us to: 1) Release each and everyone of them independently 2) Introduce dependencies between modules -> lucene and Solr -> modules + lucene as, IMO, it should be. Lucene is core, modules extends it and Solr extends and uses both. 3) Allow one to checkout exactly what it needs to work on. 4) Modules will always depend on a certain lucene version, either a cut release or trunk. When it's released, its build.xml will be changed as part of the release process to point to the lucene release (not trunk!) it supports and depends on. 5) Same for Solr. When a patch for Solr needs to change code in lucene, it is done it both, by two different patches. Both are committed within the same issue. Since each trunk can depends on the other's trunk, this shouldn't be a problem. Indeed, it will complicate a bit the build.xmls - like it's done today for core lucene and backwards. But that's ok I think. I don't expect all Solr issues to require a change in lucene as well as not all modules issues will. So that change to the build.xml should not be a frequent operation. Another thing this will change (and I think for the better) is that a Solr release might require cutting a Lucene and modules ones, and I think we should be flexible about that. This also is not something I think will be frequent ... like today, Solr could still be limited to a certain lucene release or trunk revision. I still this is still in line w/ one project, one codebase, just different levels of the really big parts (Solr, lucene and modules). Committers can be given access to which will give them access to everything. Others (modules-committers) can be given access to just that folder (hijacking a bit from the other thread). The flexibility of being able to checkout lucene code only is important, at least to me. I wouldn't want to lose it. On the IRC stuff - I know that we cannot prevent anyone from discussing on issues anywhere, and I respect that freedom. It's just that some time ago I was told that I shouldn't hold 'private' discussions on Lucene, outside the community. I know that this IRC channel, that's called #lucene, is not completely outside the community, but here's how it looks to the outsider (not on IRC): 1) An issue is opened w/ comment "summarizing discussion on IRC ...". 2) Then a couple of hours later (or days), new comment: "more discussion summary on IRC". 3) Then some comment, some that are not on IRC 4) Then more comment (from an IRC-er): "ok we've discussed this and here's what we came up with ..." Feels like we're on a need to know basis here. Remember that when a discussion is fully open, you might have some comments on what was said in the process. When you are given the final decision, or a summary, you cannot comment on what you weren't told. That's a bit frustrating ... though I'm trying very hard to be involved w/ the mailing list, it feels like I miss TONS of discussions on IRC ... and what seems worse (as I read somewhere in the thread) is that you can open an issue w/ an idea (like happened to me), just to discover the folks on IRC took it all the way to design and impl proposals, and I was left to read the summarization ... So by no means am I trying to suggest that IRC discussions should stop. As I don't, can't and won't ever have control on that. Just like I cannot keep two people sitting in next rooms to discuss on issues or Lucene outside the list. But I'd feel better if when a discussion makes it to the list or an issue, it'd be conducted there from now on, and not as snippets/summaries of the IRC discussion. Can we keep at least that? I don't want to get people off their seats w/ that request :). I'm not even sure I'm in a position to make such requests :). But I'd appreciate if it can be at least discussed (not on IRC). Shai On Tue, Mar 16, 2010 at 5:48 PM, Grant Ingersoll wrote: > > On Mar 16, 2010, at 10:18 AM, Mark Miller wrote: > > > On 03/16/2010 10:09 AM, Yonik Seeley wrote: > >> On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch > wrote: > >> > >>> Also, we're in review-and-commit process, not commit-and-review. > Changes have to be > >>> proposed, discussed and ideally attached to jira as patches first. > >>> > >> Correction, just for the sake of avoiding future confusion (i.e. I'm > >> not making any point about this thread): > >> > >> Lucene and Solr have always officially been CTR. > >> For trunk, we normally use a bit of informal lazy consensus for > >> anything big, hard, or that might be controvertial... but we are not > >> officially RTC. > >> > >> -Yonik > >> > >> -
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846028#action_12846028 ] Jason Rutherglen commented on LUCENE-2324: -- Carrying over from LUCENE-2312. I'm proposing we for starters have a byte slice writer, lock, move or copy(?) the bytes from the writable byte pool/writer to a read only byte block pool, unlock. This sounds like a fairly self-contained thing that can be unit tested at a low level. Mike, can you add a bit as to how this could work? Also, what is the IntBlockPool used for? > Per thread DocumentsWriters that write their own private segments > - > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals
[ https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2326: -- Fix Version/s: 3.1 Flex Branch I think the ideal case for this would be that the backwards folder simply contains the src-folder of the previous branch (after creation). No extra folder like now in between, so it looks like "/backwards/src/...". After a release, one would first "svn rm" the old and then "svn copy" the src folder of the previously created release branch to trunk. I would add this to the release todo. On this change, all committers must first manually do a operating-system "rm -rf" on the backwards folder by calling "ant clean-backwards" before svn up. Maybe create a patch before :-) > Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards > branch and linking snowball tests by svn:externals > --- > > Key: LUCENE-2326 > URL: https://issues.apache.org/jira/browse/LUCENE-2326 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: Flex Branch, 3.1 > > > As we often need to update backwards tests together with trunk and always > have to update the branch first, record rev no, and update build xml, I would > simply like to do a svn copy/move of the backwards branch. > After a release, this is simply also done: > {code} > svn rm backwards > svn cp releasebranch backwards > {code} > By this we can simply commit in one pass, create patches in one pass. > The snowball tests are currently downloaded by svn.exe, too. These need a > fixed version for checkout. I would like to change this to use svn:externals. > Will provide patch, soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846002#action_12846002 ] Shai Erera commented on LUCENE-2310: I agree. Then keeping both deprecated and new API should be supported easily. > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals
[ https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846000#action_12846000 ] Robert Muir commented on LUCENE-2326: - I agree i think its nice to see a patch to lucene that includes any changes to the backwards tests. Mike did this with LUCENE-2111 and i was shocked, until I found out he was doing it manually with cat. > Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards > branch and linking snowball tests by svn:externals > --- > > Key: LUCENE-2326 > URL: https://issues.apache.org/jira/browse/LUCENE-2326 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > > As we often need to update backwards tests together with trunk and always > have to update the branch first, record rev no, and update build xml, I would > simply like to do a svn copy/move of the backwards branch. > After a release, this is simply also done: > {code} > svn rm backwards > svn cp releasebranch backwards > {code} > By this we can simply commit in one pass, create patches in one pass. > The snowball tests are currently downloaded by svn.exe, too. These need a > fixed version for checkout. I would like to change this to use svn:externals. > Will provide patch, soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals
Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals --- Key: LUCENE-2326 URL: https://issues.apache.org/jira/browse/LUCENE-2326 Project: Lucene - Java Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler As we often need to update backwards tests together with trunk and always have to update the branch first, record rev no, and update build xml, I would simply like to do a svn copy/move of the backwards branch. After a release, this is simply also done: {code} svn rm backwards svn cp releasebranch backwards {code} By this we can simply commit in one pass, create patches in one pass. The snowball tests are currently downloaded by svn.exe, too. These need a fixed version for checkout. I would like to change this to use svn:externals. Will provide patch, soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845978#action_12845978 ] Michael Busch commented on LUCENE-2312: --- {quote} think we simply need a way to publish byte arrays to all threads? Michael B. can you post something of what you have so we can get an idea of how your system will work (ie, mainly what the assumptions are)? {quote} It's kinda complicated to explain and currently differs from Lucene's TermHash classes a lot. I'd prefer to wait a little bit until I have verified that my solution works. I think here we should really tackle LUCENE-2324 first - it's a prereq. Wanna help with that, Jason? > Search on IndexWriter's RAM Buffer > -- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.1 >Reporter: Jason Rutherglen >Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845971#action_12845971 ] Jason Rutherglen commented on LUCENE-2312: -- To clarify the above comment, DW's update doc method would acquire a mutex. The flush bytes method would also acquire that mutex when it copies existing writeable bytes over to the readable bytes thing (pool?). > Search on IndexWriter's RAM Buffer > -- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.1 >Reporter: Jason Rutherglen >Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845972#action_12845972 ] Chris Male commented on LUCENE-2310: I recommend we keep it as a List since that facilitates having different iterators by FieldType criteria more. A Map would support get and remove better, but I think we want to move people to using Iterators and the remove method is there for a case we don't know of yet. I'll create a patch with these ideas shortly. Cheers! > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845968#action_12845968 ] Shai Erera commented on LUCENE-2310: That was usually the approach. You provide new methods, deprecate old ones, however both work and not in a XOR mode. Both should work and we need to ensure that if people call both they still function properly. Unless this has changed, in which case it should be clearly documented. But I don't think it is a big problem to support both? If Document still keeps its fields in a List then all should remain the same. We could have a 4.0 note to switch to a Map based DS to better support remove, but that's questionable because we'll need to maintain ordering on the fields (the order in which they inserted) though personally I don't think it should matter much to the user, however that's the current implementation. > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845969#action_12845969 ] Michael Busch commented on LUCENE-2312: --- {quote} I thought we're moving away from byte block pooling and we're going to try relying on garbage collection? Does a volatile object[] publish changes to all threads? Probably not, again it'd just be the pointer. {quote} We were so far only considering moving away from pooling of (Raw)PostingList objects. Pooling byte blocks might have more performance impact - they're more heavy-weight. > Search on IndexWriter's RAM Buffer > -- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.1 >Reporter: Jason Rutherglen >Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1488) multilingual analyzer based on icu
[ https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845951#action_12845951 ] Robert Muir commented on LUCENE-1488: - Thanks for the review Uwe! moving forwards... > multilingual analyzer based on icu > -- > > Key: LUCENE-1488 > URL: https://issues.apache.org/jira/browse/LUCENE-1488 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 3.1 > > Attachments: ICUAnalyzer.patch, LUCENE-1488.patch, LUCENE-1488.patch, > LUCENE-1488.patch, LUCENE-1488.patch, LUCENE-1488.txt, LUCENE-1488.txt > > > The standard analyzer in lucene is not exactly unicode-friendly with regards > to breaking text into words, especially with respect to non-alphabetic > scripts. This is because it is unaware of unicode bounds properties. > I actually couldn't figure out how the Thai analyzer could possibly be > working until i looked at the jflex rules and saw that codepoint range for > most of the Thai block was added to the alphanum specification. defining the > exact codepoint ranges like this for every language could help with the > problem but you'd basically be reimplementing the bounds properties already > stated in the unicode standard. > in general it looks like this kind of behavior is bad in lucene for even > latin, for instance, the analyzer will break words around accent marks in > decomposed form. While most latin letter + accent combinations have composed > forms in unicode, some do not. (this is also an issue for asciifoldingfilter > i suppose). > I've got a partially tested standardanalyzer that uses icu Rule-based > BreakIterator instead of jflex. Using this method you can define word > boundaries according to the unicode bounds properties. After getting it into > some good shape i'd be happy to contribute it for contrib but I wonder if > theres a better solution so that out of box lucene will be more friendly to > non-ASCII text. Unfortunately it seems jflex does not support use of these > properties such as [\p{Word_Break = Extend}] so this is probably the major > barrier. > Thanks, > Robert -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845950#action_12845950 ] Jason Rutherglen commented on LUCENE-2312: -- {quote}The tricky part is to make sure that a reader always sees a consistent snapshot of the index. At the same time a reader must not follow pointers to non-published locations (e.g. array blocks). {quote} Right. In what case in the term enum, term docs chain of doc scoring would a reader potentially try to follow a pointer to a byte array that doesn't exist? I think we're strictly preventing it via last doc ids? Also, when we flush, I think we need to block further doc writing (via an RW lock?) and wait for any currently writing docs to complete, then forcibly publish the byte arrays, then release the write lock? This way we always have published data that's consistent for readers (eg, the inverted index can be read completely, and there won't be any wild writes still occurring to a byte array that's been published). > Search on IndexWriter's RAM Buffer > -- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.1 >Reporter: Jason Rutherglen >Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On Mar 16, 2010, at 10:18 AM, Mark Miller wrote: > On 03/16/2010 10:09 AM, Yonik Seeley wrote: >> On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch wrote: >> >>> Also, we're in review-and-commit process, not commit-and-review. Changes >>> have to be >>> proposed, discussed and ideally attached to jira as patches first. >>> >> Correction, just for the sake of avoiding future confusion (i.e. I'm >> not making any point about this thread): >> >> Lucene and Solr have always officially been CTR. >> For trunk, we normally use a bit of informal lazy consensus for >> anything big, hard, or that might be controvertial... but we are not >> officially RTC. >> >> -Yonik >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > > In any case, this is a branch. People really want to enforce RTC on a > branch??? Even if that was our official process on trunk (which I agree it > has not been) that's not how the flex branch worked. That's not how the > solr_cloud branch worked. That's not how other previous branches have worked. > > IMO - anyone should be able to create a branch for anything - to play around > with whatever they want. We should encourage this. Branches are good. And > they take up little space. > +1. Furthermore, it is incumbent on the people working on the branch to then present and discuss when/how to merge to trunk, just like any big patch. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845943#action_12845943 ] Jason Rutherglen commented on LUCENE-2312: -- I thought we're moving away from byte block pooling and we're going to try relying on garbage collection? Does a volatile object[] publish changes to all threads? Probably not, again it'd just be the pointer. In the case of posting/termdocs iteration, I'm more concerned that the lastDocID be volatile than the with the byte array containing extra data. Extra docs is OK in the byte array because we'll simply stop iterating when we've reached the last doc. Though with our system, we shouldn't even run into this either, meaning a byte array is copied and published, perhaps the master byte array is still being written to and the same byte array (by id or something) is published again? Then we'd have multiple versions of byte arrays. That could be bad. Because there is one DW per thread, there's only one document being indexed at a time. There's no writer concurrency. This leaves reader concurrency. However after each doc, we *could* simply flush all bytes related to the doc. Any new docs must simply start writing to new byte arrays? The problem with this is, unless the byte arrays are really small, we'll have a lot of extra data around, well, unless the byte arrays are trimmed before publication. Or we can simply RW lock (or some other analogous thing) individual byte arrays, not publish them after each doc, then only publish them when get reader is called. To clarify, the RW lock (or flag) would only be per byte array, in fact, all writing to the byte array could necessarily cease on flush, and new byte arrays allocated. The published byte array could point to the next byte array. I think we simply need a way to publish byte arrays to all threads? Michael B. can you post something of what you have so we can get an idea of how your system will work (ie, mainly what the assumptions are)? We do need to strive for correctness of data, and perhaps performance will be slightly impacted (though compared with our current NRT we'll have an overall win). > Search on IndexWriter's RAM Buffer > -- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.1 >Reporter: Jason Rutherglen >Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845939#action_12845939 ] Chris Male commented on LUCENE-2310: {quote} So overall we agree on the changes that need to be made. BTW, when you deprecate a method, you usually change it to call the new API or change it to use the new data structures or whatever. So we need to think how to impl getFields such that if one calls remove, numFields or use the iterator on an interleving manner, his code doesn't break ... I don't think it should be hard but it might be a good idea to even write such (deprecated) unit test {quote} I'm not sure we have to change getFields. We can just deprecate it, and point people to the new methods. I think it'd be more effort than its worth to create a List impl that calls the new methods. Was that what you were implying? I do agree its worth writing a test to ensure all old functionality can be done via the new methods somehow. > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845936#action_12845936 ] Shai Erera commented on LUCENE-2310: I'm sorry for the confusion - I got used to all the deprecation discussions so much that it's embedded in my replies :) - when I wrote "instead getFields" I meant that it will be deprecated, and we'll carry it w/ us until 4.0 is out. So overall we agree on the changes that need to be made. BTW, when you deprecate a method, you usually change it to call the new API or change it to use the new data structures or whatever. So we need to think how to impl getFields such that if one calls remove, numFields or use the iterator on an interleving manner, his code doesn't break ... I don't think it should be hard but it might be a good idea to even write such (deprecated) unit test > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance
[ https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845934#action_12845934 ] Robert Muir commented on LUCENE-2098: - I think the best way to proceed would be to make it easy to benchmark CharFilters in contrib/benchmark, especially this HTML stripping one. Honestly we don't even know for sure any performance degradation reported in the original link is really due to BaseCharFilter yet, so I think we need to benchmark and profile. > make BaseCharFilter more efficient in performance > - > > Key: LUCENE-2098 > URL: https://issues.apache.org/jira/browse/LUCENE-2098 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-2098.patch > > > Performance degradation in Solr 1.4 was reported. See: > http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4 > The inefficiency has been pointed out in BaseCharFilter javadoc by Mike: > {panel} > NOTE: This class is not particularly efficient. For example, a new class > instance is created for every call to addOffCorrectMap(int, int), which is > then appended to a private list. > {panel} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2320) Add MergePolicy to IndexWriterConfig
[ https://issues.apache.org/jira/browse/LUCENE-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845930#action_12845930 ] Shai Erera commented on LUCENE-2320: But it's MP which requires IW. So how will your policeman (like the name :)) proposal prevent it? I think that setting IW on MP is not such a bad thing. If MP needs it then it needs. The question now is to what length do we want to go w/ it: make it sort of final (in which case SetOnce makes sense) or settle w/ a setIW which is simpler. This issue is more about moving MP into IWC than refactor MP. I'd like to keep it focused on that as much as possible. I don't mean that we should stop discussing the refactoring, just to say it can be done separately. After MP moves to IWC and all code is converted to use the new API, refactoring MP internally should not affect the API level, right? If u agree w/ that, then how do u propose to continue? W/ SetOnce or a simple setter? > Add MergePolicy to IndexWriterConfig > > > Key: LUCENE-2320 > URL: https://issues.apache.org/jira/browse/LUCENE-2320 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shai Erera >Assignee: Michael McCandless > Fix For: 3.1 > > Attachments: LUCENE-2320.patch > > > Now that IndexWriterConfig is in place, I'd like to move MergePolicy to it as > well. The change is not straightforward and so I've kept it for a separate > issue. MergePolicy requires in its ctor an IndexWriter, however none can be > passed to it before an IndexWriter actually exists. And today IW may create > an MP just for it to be overridden by the application one line afterwards. I > don't want to make iw member of MP non-final, or settable by extending > classes, however it needs to remain protected so they can access it directly. > So the proposed changes are: > * Add a SetOnce object (to o.a.l.util), or Immutable, which can only be set > once (hence its name). It'll have the signature SetOnce w/ *synchronized > set* and *T get()*. T will be declared volatile, so that get() won't be > synchronized. > * MP will define a *protected final SetOnce writer* instead of > the current writer. *NOTE: this is a bw break*. any suggestions are welcomed. > * MP will offer a public default ctor, together with a set(IndexWriter). > * IndexWriter will set itself on MP using set(this). Note that if set will be > called more than once, it will throw an exception (AlreadySetException - or > does someone have a better suggestion, preferably an already existing Java > exception?). > That's the core idea. I'd like to post a patch soon, so I'd appreciate your > review and proposals. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance
[ https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845919#action_12845919 ] Michael McCandless commented on LUCENE-2098: bq. I think this is why it got slower with my patch, in practice it didn't matter that this thing did 'backwards linear lookup' due to this reason? Ahh yes since presumably the test was simply looking up the offsets for the current token... > make BaseCharFilter more efficient in performance > - > > Key: LUCENE-2098 > URL: https://issues.apache.org/jira/browse/LUCENE-2098 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-2098.patch > > > Performance degradation in Solr 1.4 was reported. See: > http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4 > The inefficiency has been pointed out in BaseCharFilter javadoc by Mike: > {panel} > NOTE: This class is not particularly efficient. For example, a new class > instance is created for every call to addOffCorrectMap(int, int), which is > then appended to a private list. > {panel} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1488) multilingual analyzer based on icu
[ https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845917#action_12845917 ] Uwe Schindler commented on LUCENE-1488: --- Attribute looks good! I would only fix toString() to match the defaulkt impl by using syntax variableName + "=" + value, here "code="+getName(code). This makes AttrubuteSource.toString() look nice. > multilingual analyzer based on icu > -- > > Key: LUCENE-1488 > URL: https://issues.apache.org/jira/browse/LUCENE-1488 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 3.1 > > Attachments: ICUAnalyzer.patch, LUCENE-1488.patch, LUCENE-1488.patch, > LUCENE-1488.patch, LUCENE-1488.patch, LUCENE-1488.txt, LUCENE-1488.txt > > > The standard analyzer in lucene is not exactly unicode-friendly with regards > to breaking text into words, especially with respect to non-alphabetic > scripts. This is because it is unaware of unicode bounds properties. > I actually couldn't figure out how the Thai analyzer could possibly be > working until i looked at the jflex rules and saw that codepoint range for > most of the Thai block was added to the alphanum specification. defining the > exact codepoint ranges like this for every language could help with the > problem but you'd basically be reimplementing the bounds properties already > stated in the unicode standard. > in general it looks like this kind of behavior is bad in lucene for even > latin, for instance, the analyzer will break words around accent marks in > decomposed form. While most latin letter + accent combinations have composed > forms in unicode, some do not. (this is also an issue for asciifoldingfilter > i suppose). > I've got a partially tested standardanalyzer that uses icu Rule-based > BreakIterator instead of jflex. Using this method you can define word > boundaries according to the unicode bounds properties. After getting it into > some good shape i'd be happy to contribute it for contrib but I wonder if > theres a better solution so that out of box lucene will be more friendly to > non-ASCII text. Unfortunately it seems jflex does not support use of these > properties such as [\p{Word_Break = Extend}] so this is probably the major > barrier. > Thanks, > Robert -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 5:42 AM, Michael McCandless wrote: > I think it like the 1st option best (lucene moves as subdir to solr's > current trunk SVN path), but I don't feel strongly. > > This'd mean one could simply checkout lucene alone and do everything > you can do today. > > But if you check out solr, you also get a full checkout of lucene, and > solr's build.xml will go and build lucene, copy over its jars to its > lib folder, and then do everything it currently does. > > I think? > > This small step is not much change over what we have today -- the code > simply moves, unchanged, except for some fixes to solr's build.xml to > go and build its lucene subdir first. Huh - I was leaning more toward putting solr under lucene because I thought that might be more acceptable to the lucene folks (actually, now lucene/solr folks) than vice-versa. But your points make perfect sense. > The bigger stuff, ideas on modules like renaming contrib->modules, > consolidating all analyzers, queries, queryparsers, highlighters, all > comes later. +1 -Yonik - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1488) multilingual analyzer based on icu
[ https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1488: Attachment: LUCENE-1488.patch uploading a dump of my workspace, so Uwe can review the new attribute. > multilingual analyzer based on icu > -- > > Key: LUCENE-1488 > URL: https://issues.apache.org/jira/browse/LUCENE-1488 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/analyzers >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 3.1 > > Attachments: ICUAnalyzer.patch, LUCENE-1488.patch, LUCENE-1488.patch, > LUCENE-1488.patch, LUCENE-1488.patch, LUCENE-1488.txt, LUCENE-1488.txt > > > The standard analyzer in lucene is not exactly unicode-friendly with regards > to breaking text into words, especially with respect to non-alphabetic > scripts. This is because it is unaware of unicode bounds properties. > I actually couldn't figure out how the Thai analyzer could possibly be > working until i looked at the jflex rules and saw that codepoint range for > most of the Thai block was added to the alphanum specification. defining the > exact codepoint ranges like this for every language could help with the > problem but you'd basically be reimplementing the bounds properties already > stated in the unicode standard. > in general it looks like this kind of behavior is bad in lucene for even > latin, for instance, the analyzer will break words around accent marks in > decomposed form. While most latin letter + accent combinations have composed > forms in unicode, some do not. (this is also an issue for asciifoldingfilter > i suppose). > I've got a partially tested standardanalyzer that uses icu Rule-based > BreakIterator instead of jflex. Using this method you can define word > boundaries according to the unicode bounds properties. After getting it into > some good shape i'd be happy to contribute it for contrib but I wonder if > theres a better solution so that out of box lucene will be more friendly to > non-ASCII text. Unfortunately it seems jflex does not support use of these > properties such as [\p{Word_Break = Extend}] so this is probably the major > barrier. > Thanks, > Robert -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On 03/16/2010 10:09 AM, Yonik Seeley wrote: On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch wrote: Also, we're in review-and-commit process, not commit-and-review. Changes have to be proposed, discussed and ideally attached to jira as patches first. Correction, just for the sake of avoiding future confusion (i.e. I'm not making any point about this thread): Lucene and Solr have always officially been CTR. For trunk, we normally use a bit of informal lazy consensus for anything big, hard, or that might be controvertial... but we are not officially RTC. -Yonik - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org In any case, this is a branch. People really want to enforce RTC on a branch??? Even if that was our official process on trunk (which I agree it has not been) that's not how the flex branch worked. That's not how the solr_cloud branch worked. That's not how other previous branches have worked. IMO - anyone should be able to create a branch for anything - to play around with whatever they want. We should encourage this. Branches are good. And they take up little space. Branch changes have to be proposed, discussed, and attached to JIRA? Uggg - I certainly hope not. Branches should be considered replacements for huge unwieldy patches. Do I have to propose and discuss before I put up a patch? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch wrote: > Also, we're in review-and-commit process, not commit-and-review. Changes > have to be > proposed, discussed and ideally attached to jira as patches first. Correction, just for the sake of avoiding future confusion (i.e. I'm not making any point about this thread): Lucene and Solr have always officially been CTR. For trunk, we normally use a bit of informal lazy consensus for anything big, hard, or that might be controvertial... but we are not officially RTC. -Yonik - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On 03/16/2010 09:05 AM, Andrzej Bialecki wrote: On 2010-03-16 12:29, Mark Miller wrote: From our perspective, we would have been just as happy with a branch on my local hard drive! That would have taken longer to setup though. You could have used git instead. There is a good integration between git and svn, and it's much easier (a giant understatement...) to handle branching and merging in git, both between git branches and syncing with external svn. Yeah, we have actually discussed doing things like GIT in the past - prob main reason we didn't is learning curve at the moment. I haven't used it yet. -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On 2010-03-16 12:29, Mark Miller wrote: From our perspective, we would have been just as happy with a branch on my local hard drive! That would have taken longer to setup though. You could have used git instead. There is a good integration between git and svn, and it's much easier (a giant understatement...) to handle branching and merging in git, both between git branches and syncing with external svn. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
My snap impression is that moving lucene to a sub-tree under SOLR would introduce some confusion in the minds of new folks looking at the code. *We* all know that Lucene stands by itself, but putting it under a solr makes that less obvious. I claim that there would be questions like "so can I just use Lucene without SOLR?". That said, the questions about release management, branching, tagging, etc. take complete precedence over minor confusion when the answer is "just go to directory X and checkout if you want Lucene only". FWIW Erick On Tue, Mar 16, 2010 at 8:30 AM, Robert Muir wrote: > On Tue, Mar 16, 2010 at 3:43 AM, Simon Willnauer > wrote: > > > One more thing which I wonder about even more is that this whole > > merging happens so quickly for reasons I don't see right now. I don't > > want to keep anybody from making progress but it appears like a rush > > to me. > > > By the way, the serious changes we applied to the branch, most of them > have been sitting in JIRA over 3 months not doing much: SOLR-1659 > > if you follow the linked issues, you can see all the stuff that got > put in the branch... the branch was helpful for me, as I could help > Mark with the "ton of little things", like TokenStreams embedded > inside JSP files :) > > As its just a branch, if you want to go look at those patches > (especially anything I did) and provide technical feedback, that would > be great! > > But I think its a mistake to say things are rushed when the work has > been done for months. > > -- > Robert Muir > rcm...@gmail.com > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >
Re: lucene and solr trunk
On Mar 16, 2010, at 3:51 AM, Michael Busch wrote: > On 3/16/10 12:43 AM, Simon Willnauer wrote:Me too. I don't have the time to > follow IRC in addition to jira and mailinglists. I know I've been missing > stuff, because in the past I commented on jira issues and later was told that > my questions were already discussed thoroughly on IRC. I've also seen jira > issues that start with something like "Summary of IRC discussion:". I too am troubled by the likes of this and have been feeling much the same way, as many already know. It is on my list of things to discuss with the community, but I was going to wait a week or so to send, to let the volume die down a bit. -Grant - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance
[ https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845887#action_12845887 ] Robert Muir commented on LUCENE-2098: - Mark did some quick tests and this patch only seems to make things slower. bq. Really most apps do not need all positions stored, ie, they only need to see typically the current token. I think this is why it got slower with my patch, in practice it didn't matter that this thing did 'backwards linear lookup' due to this reason? > make BaseCharFilter more efficient in performance > - > > Key: LUCENE-2098 > URL: https://issues.apache.org/jira/browse/LUCENE-2098 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-2098.patch > > > Performance degradation in Solr 1.4 was reported. See: > http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4 > The inefficiency has been pointed out in BaseCharFilter javadoc by Mike: > {panel} > NOTE: This class is not particularly efficient. For example, a new class > instance is created for every call to addOffCorrectMap(int, int), which is > then appended to a private list. > {panel} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 3:43 AM, Simon Willnauer wrote: > One more thing which I wonder about even more is that this whole > merging happens so quickly for reasons I don't see right now. I don't > want to keep anybody from making progress but it appears like a rush > to me. By the way, the serious changes we applied to the branch, most of them have been sitting in JIRA over 3 months not doing much: SOLR-1659 if you follow the linked issues, you can see all the stuff that got put in the branch... the branch was helpful for me, as I could help Mark with the "ton of little things", like TokenStreams embedded inside JSP files :) As its just a branch, if you want to go look at those patches (especially anything I did) and provide technical feedback, that would be great! But I think its a mistake to say things are rushed when the work has been done for months. -- Robert Muir rcm...@gmail.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On 03/16/2010 07:05 AM, Shalin Shekhar Mangar wrote: Wow, you guys are moving fast! Thats a good thing. IRC is fine if you want to discuss something quickly. But it has its limitations. For example, I cannot follow IRC most of the times because I'm in a different time zone. But I don't want to stop anyone either. In fact, I can't do that. Nobody can. All I want to say is that once discussions have happened and a plan agreed upon, it may be a good idea to let solr-dev/java-dev know the plan. In this case I didn't know a new branch was created until I saw was a commit notification and then Yonik's email. Hi Shalin - I like your attitude ;) - Yonik's email was the notification of the plan :) Though we had no plan. When Robert and I made the branch we had no plan really - we just needed a place to put together our patches and do the final work. We were trying to do it with patches, but it was becoming difficult. But when we started we had no real plan - just to see if we could get Solr up and running on Lucene 3.01 and then trunk. Anything beyond that, we have not planned for - and before that was even completed, there were emails to java-dev about it. But we conceived nothing beyond seeing if we could get Solr running on the latest Lucene. From our perspective, we would have been just as happy with a branch on my local hard drive! That would have taken longer to setup though. -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 3:44 PM, Mark Miller wrote: > On 03/16/2010 03:43 AM, Simon Willnauer wrote: > >> >> One more thing which I wonder about even more is that this whole >> merging happens so quickly for reasons I don't see right now. I don't >> want to keep anybody from making progress but it appears like a rush >> to me. >> >> > > Meh - I think your just plain wrong about this. Anyone can work as fast as > they want on anything. Nothing has happened faster than the community wants > yet. Your too concerned. This is called discussion. Nothing has happened. In > my opinion, the whole freak out of what goes where in svn was so over blown > - its so easy to move this stuff around at the drop of a hat. That's why it > was suggested we put a branch there and no one saw anything wrong it with > for the moment - everyone said, well we can just easily move it if someone > has an issue - which we did. Didn't expect the freak out though. Frankly, we > were just seeking a branch really, and didn't care where it went. > > Some of us are anxious to do some work - some of us are anxious to merge > some code - no one is forcing this stuff on the others at a rapid pace - > everyone gets there say as always. This is why we wanted a branch we could > committ what we wanted to. SVN locations make starting the merge of code > easier. They are easy to change. This is not like rushing index format > changes. Its src code location - it can be moved at the drop of the hat. The > sooner we resolve what we are going to do, the sooner we can start getting > more work done that we hoped to get down with this merge. This thread starts > that discussion. You can't start a discussion to early. Perhaps it leads to > another discussion first, but their is no such thing as rushing the start of > discussion. It doesn't say "figure it out by tomorrow, cause we are doing > this tomorrow. " It doesn't say, figure this out by next week, because we > are doing this next week. It says lets discuss where this is going to go. > > I think some people just need to relax, and discuss what they would like to > see and worry less about how fast others are working. Fast work is good. It > means more work. Nothing is going to happen until the community figures > things out. > > > BTW: I still have the impression that if I don't follow IRC constantly >> I'm missing important things. >> >> > That's your impression then. Follow IRC if you want. People talk all over > the places about Lucen/Solr - many times in places you can't follow - if it > didn't happen on the list, it didn't happen. Michael Busch follows up > saying, "people say it was discussed thoroughly on IRC" - so what? It > doesn't count as a valid point of reference - I haven't seen that, but you > can just tell someone that says that so - they owe you an explanation. > > Wow, you guys are moving fast! Thats a good thing. IRC is fine if you want to discuss something quickly. But it has its limitations. For example, I cannot follow IRC most of the times because I'm in a different time zone. But I don't want to stop anyone either. In fact, I can't do that. Nobody can. All I want to say is that once discussions have happened and a plan agreed upon, it may be a good idea to let solr-dev/java-dev know the plan. In this case I didn't know a new branch was created until I saw was a commit notification and then Yonik's email. -- Regards, Shalin Shekhar Mangar.
Re: lucene and solr trunk
I think it like the 1st option best (lucene moves as subdir to solr's current trunk SVN path), but I don't feel strongly. This'd mean one could simply checkout lucene alone and do everything you can do today. But if you check out solr, you also get a full checkout of lucene, and solr's build.xml will go and build lucene, copy over its jars to its lib folder, and then do everything it currently does. I think? This small step is not much change over what we have today -- the code simply moves, unchanged, except for some fixes to solr's build.xml to go and build its lucene subdir first. The bigger stuff, ideas on modules like renaming contrib->modules, consolidating all analyzers, queries, queryparsers, highlighters, all comes later. Mike On Mon, Mar 15, 2010 at 10:28 PM, Yonik Seeley wrote: > Due to a tremendous amount of work by our newly merged committer > corps, the get-on-lucene-trunk branch (branches/solr) is ready for > prime-time as the new solr trunk! Lucene and Solr need to move to a > common trunk for a host of reasons, including single patches that can > cover both, shared tags and branches, and shared test code w/o a test > jar. > > The current Lucene trunk is: .../lucene/java/trunk > The current Solr trunk is: .../lucene/solr/trunk > > So, we have a few options on where to put Solr's new trunk: > > Lucene moves to Solr's trunk: > /solr/trunk, /solr/trunk/lucene > > Solr moves to Lucene's trunk: > /java/trunk, /java/trunk/solr > > Both projects move to a new trunk: > /something/trunk/java, /something/trunk/solr > > -Yonik > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On 03/16/2010 03:43 AM, Simon Willnauer wrote: One more thing which I wonder about even more is that this whole merging happens so quickly for reasons I don't see right now. I don't want to keep anybody from making progress but it appears like a rush to me. Meh - I think your just plain wrong about this. Anyone can work as fast as they want on anything. Nothing has happened faster than the community wants yet. Your too concerned. This is called discussion. Nothing has happened. In my opinion, the whole freak out of what goes where in svn was so over blown - its so easy to move this stuff around at the drop of a hat. That's why it was suggested we put a branch there and no one saw anything wrong it with for the moment - everyone said, well we can just easily move it if someone has an issue - which we did. Didn't expect the freak out though. Frankly, we were just seeking a branch really, and didn't care where it went. Some of us are anxious to do some work - some of us are anxious to merge some code - no one is forcing this stuff on the others at a rapid pace - everyone gets there say as always. This is why we wanted a branch we could committ what we wanted to. SVN locations make starting the merge of code easier. They are easy to change. This is not like rushing index format changes. Its src code location - it can be moved at the drop of the hat. The sooner we resolve what we are going to do, the sooner we can start getting more work done that we hoped to get down with this merge. This thread starts that discussion. You can't start a discussion to early. Perhaps it leads to another discussion first, but their is no such thing as rushing the start of discussion. It doesn't say "figure it out by tomorrow, cause we are doing this tomorrow. " It doesn't say, figure this out by next week, because we are doing this next week. It says lets discuss where this is going to go. I think some people just need to relax, and discuss what they would like to see and worry less about how fast others are working. Fast work is good. It means more work. Nothing is going to happen until the community figures things out. BTW: I still have the impression that if I don't follow IRC constantly I'm missing important things. That's your impression then. Follow IRC if you want. People talk all over the places about Lucen/Solr - many times in places you can't follow - if it didn't happen on the list, it didn't happen. Michael Busch follows up saying, "people say it was discussed thoroughly on IRC" - so what? It doesn't count as a valid point of reference - I haven't seen that, but you can just tell someone that says that so - they owe you an explanation. -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance
[ https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845788#action_12845788 ] Michael McCandless commented on LUCENE-2098: Ahh ok. Probably we should switch to parallel arrays here, to make it very fast... yes this will consume RAM (8 bytes per position, if we keep all of them). Really most apps do not need all positions stored, ie, they only need to see typically the current token. So maybe we could make a filter that takes a "lookbehind size" and it'd only keep that number of mappings cached? That'd have to be > the max size of any token you may analyze, so hard to bound perfectly, but eg setting this to the max allowed token in IndexWriter would guarantee that we'd never have a miss? For analyzers that buffer tokens... they'd have to set this max to infinity, or, ensure they remap the offsets before capturing the token's state? > make BaseCharFilter more efficient in performance > - > > Key: LUCENE-2098 > URL: https://issues.apache.org/jira/browse/LUCENE-2098 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-2098.patch > > > Performance degradation in Solr 1.4 was reported. See: > http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4 > The inefficiency has been pointed out in BaseCharFilter javadoc by Mike: > {panel} > NOTE: This class is not particularly efficient. For example, a new class > instance is created for every call to addOffCorrectMap(int, int), which is > then appended to a private list. > {panel} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch wrote: > On 3/16/10 12:43 AM, Simon Willnauer wrote: >> >> If my impression should be wrong or if I miss something please ignore >> the last paragraph. > > I feel exactly like you, Simon. I don't understand the rush. Also, we're > in review-and-commit process, not commit-and-review. Changes have to be > proposed, discussed and ideally attached to jira as patches first. There's obviously alot of excitement driving the progress here, and there's been awesome progress. Things are moving fast, but... Remember that all commits/fast iterations are being done on a branch, so that people involved can make fast progress. When we land that branch onto trunk, there will be the usual scrutiny ("review then commit") of the changes that're going in, and this email was started to get the most important topic ("where does all this land, anyway") going, first. EG changes like a move to Java 1.6, disallowing compression in Solr's schema.xml, the Version changes percolating into Solr, all obviously need sizable review & discussion... >> BTW: I still have the impression that if I don't follow IRC constantly >> I'm missing important things. > > Me too. I don't have the time to follow IRC in addition to jira and > mailinglists. I know I've been missing stuff, because in the past I > commented on jira issues and later was told that my questions were already > discussed thoroughly on IRC. I've also seen jira issues that start with > something like "Summary of IRC discussion:". This is a hard problem... IRC is a very good tool to enable those that have the time (and I agree it's ALOT OF TIME -- I can't keep up with it either) to work together. Fast design discussions are a powerful way to bat around random ideas, and I'd say IRC has already produced a number of good ideas for improving Lucene (opened as issues, lately...). But the thing to remember is of all the crazy discussions that happen on IRC (and there are MANY that don't pan out), when a "real" idea pans out, it must then go through the normal process -- turn into an issue, comments are added summarizing the pros/cons that were discussed on IRC, a patch is created and must be reviewed, iterated, and then committed. The CTR process is still intact... it's just that IRC is a faster way for some devs to discuss things that may turn into real ideas (or, may get dropped on the floor). Does anyone know how other projects fold in IRC...? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance
[ https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845785#action_12845785 ] Uwe Schindler commented on LUCENE-2098: --- bq. Why did this cause Solr to slowdown...? Did Solr previously have a more efficient impl and then they cutover to Lucene's? Solr used another Filter in 1.3. > make BaseCharFilter more efficient in performance > - > > Key: LUCENE-2098 > URL: https://issues.apache.org/jira/browse/LUCENE-2098 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-2098.patch > > > Performance degradation in Solr 1.4 was reported. See: > http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4 > The inefficiency has been pointed out in BaseCharFilter javadoc by Mike: > {panel} > NOTE: This class is not particularly efficient. For example, a new class > instance is created for every call to addOffCorrectMap(int, int), which is > then appended to a private list. > {panel} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845780#action_12845780 ] Michael McCandless commented on LUCENE-2312: bq. In thinking about the terms dictionary, we're going to run into concurrency issues right if we just use TreeMap? Right, we need a concurrent data structure here. It's OK if there've been changes to this shared data structure since a reader was opened -- that reader knows its max doc id and so it can skip a term if the first doc id in that term is > that max. > Search on IndexWriter's RAM Buffer > -- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.1 >Reporter: Jason Rutherglen >Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845778#action_12845778 ] Michael McCandless commented on LUCENE-2312: {quote} The prototype I'm experimenting with has a fixed length postings format for the in-memory representation (in TermsHash). Basically every posting has 4 bytes, so I can use int[] arrays (instead of the byte[] pools). The first 3 bytes are used for an absolute docID (not delta-encoded). This limits the max in-memory segment size to 2^24 docs. The 1 remaining byte is used for the position. With a max doc length of 140 characters you can fit every possible position in a byte - what a luxury! If a term occurs multiple times in the same doc, then the TermDocs just skips multiple occurrences with the same docID and increments the freq. Again, the same term doesn't occur often in super short docs. The int[] slices also don't have forward pointers, like in Lucene's TermsHash, but backwards pointers. In real-time search you often want a strongly time-biased ranking. A PostingList object has a pointer that points to the last posting (this statement is not 100% correct for visibility reasons across threads, but we can imagine it this way for now). A TermDocs can now traverse the postinglists in opposite order. Skipping can be done by following pointers to previous slices directly, or by binary search within a slice. {quote} This sounds nice! This would be a custom indexing chain for docs guaranteed not to be over 255 positions in length right? > Search on IndexWriter's RAM Buffer > -- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.1 >Reporter: Jason Rutherglen >Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845777#action_12845777 ] Michael McCandless commented on LUCENE-2312: bq. The tricky part is to make sure that a reader always sees a consistent snapshot of the index. At the same time a reader must not follow pointers to non-published locations (e.g. array blocks). Right, I'm just not familiar specifically with what JMM says about one thread writing to a byte[] and another thread reading it. In general, for our usage, the reader threads will never read into an area that has not yet been written to. So that works in our favor (they can't cache those bytes if they didn't read them). EXCEPT the CPU will have loaded the bytes on a word boundary and so if our reader thread reads only 1 byte, and no more (because this is now the end of the posting), the CPU may very well have pulled in the following 7 bytes (for example) and then illegally (according to our needs) cache them. We better make some serious tests for this... including reader threads that just enum the postings for a single rarish term over and over while writer threads are indexing docs that occasionally have that term. I think that's the worst case for JMM violation since the #bytes cached is small. It's too bad there isn't higher level control on the CPU caching via java. EG, in our usage, if we could call a System.flushCPUCache whenever a thread enters a newly reopened reader because, when accessing postings via a given Reader we want point-in-time searching anyway and so any bytes cached by the CPU are perfectly fine. We only need CPU cache flush when a reader is reopened > Search on IndexWriter's RAM Buffer > -- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.1 >Reporter: Jason Rutherglen >Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance
[ https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845776#action_12845776 ] Michael McCandless commented on LUCENE-2098: Patch looks like it should be a good net/net improvement -- lookups of the offset correction should now be fast (though insertion cost is probably higher -- we create likely 3 new objects (2 ints, one TreeMap$Entry) per insert) but I expect that's a good tradeoff. > make BaseCharFilter more efficient in performance > - > > Key: LUCENE-2098 > URL: https://issues.apache.org/jira/browse/LUCENE-2098 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-2098.patch > > > Performance degradation in Solr 1.4 was reported. See: > http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4 > The inefficiency has been pointed out in BaseCharFilter javadoc by Mike: > {panel} > NOTE: This class is not particularly efficient. For example, a new class > instance is created for every call to addOffCorrectMap(int, int), which is > then appended to a private list. > {panel} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2098) make BaseCharFilter more efficient in performance
[ https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2098: --- Affects Version/s: (was: 2.9) 3.1 Why did this cause Solr to slowdown...? Did Solr previously have a more efficient impl and then they cutover to Lucene's? > make BaseCharFilter more efficient in performance > - > > Key: LUCENE-2098 > URL: https://issues.apache.org/jira/browse/LUCENE-2098 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-2098.patch > > > Performance degradation in Solr 1.4 was reported. See: > http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4 > The inefficiency has been pointed out in BaseCharFilter javadoc by Mike: > {panel} > NOTE: This class is not particularly efficient. For example, a new class > instance is created for every call to addOffCorrectMap(int, int), which is > then appended to a private list. > {panel} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845771#action_12845771 ] Chris Male commented on LUCENE-2310: Hi Shai, {quote} i like the idea of Document to implement Iterable, but how does that solve the case where someone wants to query how many fields a document has? {quote} It doesn't, but then I'd add a numFields() method maybe. It seems like something with a small use case and so having it has a method on the side seems ideal. {quote} Will you still have getFields(), only now it will return an unmodifiable collection? {quote} Yes and no. getFields will remain but with a modifiable list. I will then deprecate the method and recommend people use the Iterable. This gives everybody a chance to migrate during the 3.x versions. {quote} Maybe just do: (1) Doc implements Iterable and (2) Doc exposes numFIelds(), add(Field)? {quote} Yup lets do that. Unfortunately getFields will remain until 4.0. {quote} About remove(field), I thought of a possible scenario though I still don't think it's interesting enough - suppose that you pass your Document through a processing pipeline/chain, each handler adds fields as metadata to the Document. For example, annotators. It might be that a field A exists, only for a handler down the chain to understand A's meaning and then replace it w/ A1 and A2. For that you'll want to be able to move a field ... I guess we could add a remove method to Document, and if it'll be called while the fields are iterated on, a CME will be thrown, which is perfectly fine with me. {quote} With the idea of having remove(...) I am trying to foresee what people might be doing via getFields() and thus are not going to be able to do when its gone. We will have the ability to add and iterate, so having the functionality to remove seems to complete it. I completely agree that if something happens and a CME is thrown, then that problem should be left to the user. I think this clarifies this direction. Document will be changed as follows: - Document will become Iterable - getFields() will be deprecated in favour of the Iterable - numFields() will be added returning the number of fields - remove(String) will be added allowing someone to remove Fields with the given name. If a CME occurs, thats up to the user to handle. Cheers Shai! > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: lucene and solr trunk
Hi, > And Lucene is on Java 1.5 and should be compiled with an 1.5 compiler, > where Solr seems to be on 1.6 since yesterday? (Yonik added something > to common-build.xml). On my development system I have no Java 1.6 > installed at all as default build, I ever use Java 1.5 for building > Lucene. If we merge that and have both on different JVMs the same > problems like with 1.4/1.5 start. Developers use 1.6 methods because > their compiler does not warn them. So everybody working on Lucene > should at least have Java 1.5 compiler and try to compile his changes > before committing. I do this (as I use 1.5 for developing), 1.6 on some > of our servers. > > So: If merge, keep both on Java 1.5 !!! I changed common-build.xml in the new solr branch to Java 1.5 again, as there is currently no reason to change this and especially as it was not discussed anywhere. Java 1.5 as base for both solr and lucene is better and the few features of Java 1.6 does not rectify to move up. I have my development area configured with Java 1.5 and I only develop Lucene in 1.5. I am then sure to not use the wrong methods when creating patches. You can still tell users to run with JRE 1.6, but development should stay on 1.5 for now. Uwe - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
On 3/16/10 12:43 AM, Simon Willnauer wrote: If my impression should be wrong or if I miss something please ignore the last paragraph. I feel exactly like you, Simon. I don't understand the rush. Also, we're in review-and-commit process, not commit-and-review. Changes have to be proposed, discussed and ideally attached to jira as patches first. BTW: I still have the impression that if I don't follow IRC constantly I'm missing important things. Me too. I don't have the time to follow IRC in addition to jira and mailinglists. I know I've been missing stuff, because in the past I commented on jira issues and later was told that my questions were already discussed thoroughly on IRC. I've also seen jira issues that start with something like "Summary of IRC discussion:". Michael - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene and solr trunk
I completely agree with Uwe and Hoss. These questions need to be addressed first. I still want to be able to only checkout Lucene code and run the Lucene build independently from Solr. And Lucene needs to be able to release without Solr and the branching/tagging needs to support that as Uwe points out. Michael On 3/16/10 12:18 AM, Uwe Schindler wrote: Hi all, I don't want to be against all other developers that voted +1 for the SVN "merge", but I am not happy with it. Most importantly for the reasons Hoss mentioned: : prime-time as the new solr trunk! Lucene and Solr need to move to a : common trunk for a host of reasons, including single patches that can : cover both, shared tags and branches, and shared test code w/o a test : jar. Without a clearer picture of how people envision development "overhead" working as we move forward, it's really hard to understand how any of these ideas make sense... 1) how should hte automated build process(es) work? 2) how are we going to do branching/tagging for releases? particularly in situations where one product is ready for a rlease and hte other isn't? 3) how are we going to deal with mino bug fix release tagging? 4) should it be possible for people to check out Lucene-Java w/o checking out Solr? That are important questions and not simply to solve! (i suspect a whole lot of people who only care about the core library are going to really adamantly not want to have to check out all of Solr just to work on the core) Exactly! The Solr checkout is really huge because of thousands of JAR files and so on. The badest thing we could do would be to merge all those JARs into one general lib folder or like so. Please do not! Lucene-core should stay a lib without any external deps. : Both projects move to a new trunk: : /something/trunk/java, /something/trunk/solr This would be the only optinon we have. This new folder could simply contain two dirs below and a build.xml in the top level that delegates and builds first lucene, then solr. But you can do this also with separate checkouts and a simple script downloaded from the wiki. The problems of this approach far overweigh the positive side: In the original vote, we said, Lucene can release without Solr: Releasing (I was the last release mangaer) contains things like creating branches and tags. In SVN, if you create a branch, you copy everything from under trunk (or another branch) to a new folder below branches (for tags under tags). "tags" on most SVN servers has an additional limittation, that it is not possible to change anything under "tags" except copying. If we have those combined trunk folder and Lucene wants to release and creates a branch/tag. Solr is enforced to do this too. OK, you could say, we just branch the folder lucene and let solr where it is. But that would be a against conventions and the branch checkout could not life alone. I just repeat: we wanted to merge devs and not codebase! And merging devs is a "code change" clearly. And Lucene is on Java 1.5 and should be compiled with an 1.5 compiler, where Solr seems to be on 1.6 since yesterday? (Yonik added something to common-build.xml). On my development system I have no Java 1.6 installed at all as default build, I ever use Java 1.5 for building Lucene. If we merge that and have both on different JVMs the same problems like with 1.4/1.5 start. Developers use 1.6 methods because their compiler does not warn them. So everybody working on Lucene should at least have Java 1.5 compiler and try to compile his changes before committing. I do this (as I use 1.5 for developing), 1.6 on some of our servers. So: If merge, keep both on Java 1.5 !!! by gut says something like this will more the most sense, assuming "/something/trunk" == "/java/trunk" and "java" actually means "core" ... And that is how it looks currently and I am fine with it! ie: this discussion should really be part and parcel with how contribs should be reorged. That is exactly what should be done. Not now simply copy the folders somewhere for some "development simplification" that not really is one and opens more problems! I propose another idea for now until the "module" decision is [DISCUSS]ed and [VOTE]d: Lets create a new project folder with trunk and branches for combined trunk development in SVN (this can be later the folder for the module development). This folder simply contains a delegating build.xml (delegating the common tasks like build and test and so on to solr and trunk).The folder simply uses svn:external SVN props to link current solr and lucene trunk as subfolders. So developers that want to work on both can simply checkout this folder and SVN will resolve the externals. As this is trunk development, the externals will be without rev numbers and relative for the http(s) problem (SVN 1.5+ required). For testing flex, we create a branch of this folder
Re: lucene and solr trunk
On Tue, Mar 16, 2010 at 8:18 AM, Uwe Schindler wrote: > Hi all, > > I don't want to be against all other developers that voted +1 for the SVN > "merge", but I am not happy with it. Most importantly for the reasons Hoss > mentioned: > >> : prime-time as the new solr trunk! Lucene and Solr need to move to a >> : common trunk for a host of reasons, including single patches that can >> : cover both, shared tags and branches, and shared test code w/o a test >> : jar. >> >> Without a clearer picture of how people envision development "overhead" >> working as we move forward, it's really hard to understand how any of >> these ideas make sense... >> 1) how should hte automated build process(es) work? >> 2) how are we going to do branching/tagging for releases? >> particularly >> in situations where one product is ready for a rlease and hte other >> isn't? >> 3) how are we going to deal with mino bug fix release tagging? >> 4) should it be possible for people to check out Lucene-Java w/o >> checking out Solr? > > That are important questions and not simply to solve! > >> (i suspect a whole lot of people who only care about the core library >> are >> going to really adamantly not want to have to check out all of Solr >> just >> to work on the core) > > Exactly! The Solr checkout is really huge because of thousands of JAR files > and so on. The badest thing we could do would be to merge all those JARs into > one general lib folder or like so. Please do not! Lucene-core should stay a > lib without any external deps. > >> : Both projects move to a new trunk: >> : /something/trunk/java, /something/trunk/solr > > This would be the only optinon we have. This new folder could simply contain > two dirs below and a build.xml in the top level that delegates and builds > first lucene, then solr. But you can do this also with separate checkouts and > a simple script downloaded from the wiki. > > The problems of this approach far overweigh the positive side: > > In the original vote, we said, Lucene can release without Solr: > Releasing (I was the last release mangaer) contains things like creating > branches and tags. In SVN, if you create a branch, you copy everything from > under trunk (or another branch) to a new folder below branches (for tags > under tags). "tags" on most SVN servers has an additional limittation, that > it is not possible to change anything under "tags" except copying. > > If we have those combined trunk folder and Lucene wants to release and > creates a branch/tag. Solr is enforced to do this too. OK, you could say, we > just branch the folder lucene and let solr where it is. But that would be a > against conventions and the branch checkout could not life alone. > > I just repeat: we wanted to merge devs and not codebase! And merging devs is > a "code change" clearly. > > And Lucene is on Java 1.5 and should be compiled with an 1.5 compiler, where > Solr seems to be on 1.6 since yesterday? (Yonik added something to > common-build.xml). On my development system I have no Java 1.6 installed at > all as default build, I ever use Java 1.5 for building Lucene. If we merge > that and have both on different JVMs the same problems like with 1.4/1.5 > start. Developers use 1.6 methods because their compiler does not warn them. > So everybody working on Lucene should at least have Java 1.5 compiler and try > to compile his changes before committing. I do this (as I use 1.5 for > developing), 1.6 on some of our servers. > > So: If merge, keep both on Java 1.5 !!! > >> by gut says something like this will more the most sense, assuming >> "/something/trunk" == "/java/trunk" and "java" actually means "core" >> ... > > And that is how it looks currently and I am fine with it! > >> ie: this discussion should really be part and parcel with how contribs >> should be reorged. > > That is exactly what should be done. Not now simply copy the folders > somewhere for some "development simplification" that not really is one and > opens more problems! > > I propose another idea for now until the "module" decision is [DISCUSS]ed and > [VOTE]d: > > Lets create a new project folder with trunk and branches for combined trunk > development in SVN (this can be later the folder for the module development). > This folder simply contains a delegating build.xml (delegating the common > tasks like build and test and so on to solr and trunk).The folder simply uses > svn:external SVN props to link current solr and lucene trunk as subfolders. > So developers that want to work on both can simply checkout this folder and > SVN will resolve the externals. As this is trunk development, the externals > will be without rev numbers and relative for the http(s) problem (SVN 1.5+ > required). +1 - as I recall correctly that is what uwe and I proposed initially on IRC when solr got copied initially. This makes a lot of sense as it does not break anybodies checkouts and enables all "Solcene" developers to
Re: [DISCUSS] Do away with Contrib Committers and make core committers
On Mon, Mar 15, 2010 at 10:54 PM, Ryan McKinley wrote: >> >> Personally I'd prefer we just stop adding them, and the current ones work >> their way up like normal if they are so inclined, or the ones that are not >> even around anymore can just stay as they are. >> That sounds reasonable to me too. Yet, we should still make sure contrib committers are able to commit to the new "modules" or whatever we going to decide where contrib stuff ends up. It seems to be odd if I'd not be able to commit to the analyzers anymore because they have moved out of contrib into something new. simon > > This seems reasonable to me. > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: lucene and solr trunk
Hi all, I don't want to be against all other developers that voted +1 for the SVN "merge", but I am not happy with it. Most importantly for the reasons Hoss mentioned: > : prime-time as the new solr trunk! Lucene and Solr need to move to a > : common trunk for a host of reasons, including single patches that can > : cover both, shared tags and branches, and shared test code w/o a test > : jar. > > Without a clearer picture of how people envision development "overhead" > working as we move forward, it's really hard to understand how any of > these ideas make sense... > 1) how should hte automated build process(es) work? > 2) how are we going to do branching/tagging for releases? > particularly > in situations where one product is ready for a rlease and hte other > isn't? > 3) how are we going to deal with mino bug fix release tagging? > 4) should it be possible for people to check out Lucene-Java w/o > checking out Solr? That are important questions and not simply to solve! > (i suspect a whole lot of people who only care about the core library > are > going to really adamantly not want to have to check out all of Solr > just > to work on the core) Exactly! The Solr checkout is really huge because of thousands of JAR files and so on. The badest thing we could do would be to merge all those JARs into one general lib folder or like so. Please do not! Lucene-core should stay a lib without any external deps. > : Both projects move to a new trunk: > : /something/trunk/java, /something/trunk/solr This would be the only optinon we have. This new folder could simply contain two dirs below and a build.xml in the top level that delegates and builds first lucene, then solr. But you can do this also with separate checkouts and a simple script downloaded from the wiki. The problems of this approach far overweigh the positive side: In the original vote, we said, Lucene can release without Solr: Releasing (I was the last release mangaer) contains things like creating branches and tags. In SVN, if you create a branch, you copy everything from under trunk (or another branch) to a new folder below branches (for tags under tags). "tags" on most SVN servers has an additional limittation, that it is not possible to change anything under "tags" except copying. If we have those combined trunk folder and Lucene wants to release and creates a branch/tag. Solr is enforced to do this too. OK, you could say, we just branch the folder lucene and let solr where it is. But that would be a against conventions and the branch checkout could not life alone. I just repeat: we wanted to merge devs and not codebase! And merging devs is a "code change" clearly. And Lucene is on Java 1.5 and should be compiled with an 1.5 compiler, where Solr seems to be on 1.6 since yesterday? (Yonik added something to common-build.xml). On my development system I have no Java 1.6 installed at all as default build, I ever use Java 1.5 for building Lucene. If we merge that and have both on different JVMs the same problems like with 1.4/1.5 start. Developers use 1.6 methods because their compiler does not warn them. So everybody working on Lucene should at least have Java 1.5 compiler and try to compile his changes before committing. I do this (as I use 1.5 for developing), 1.6 on some of our servers. So: If merge, keep both on Java 1.5 !!! > by gut says something like this will more the most sense, assuming > "/something/trunk" == "/java/trunk" and "java" actually means "core" > ... And that is how it looks currently and I am fine with it! > ie: this discussion should really be part and parcel with how contribs > should be reorged. That is exactly what should be done. Not now simply copy the folders somewhere for some "development simplification" that not really is one and opens more problems! I propose another idea for now until the "module" decision is [DISCUSS]ed and [VOTE]d: Lets create a new project folder with trunk and branches for combined trunk development in SVN (this can be later the folder for the module development). This folder simply contains a delegating build.xml (delegating the common tasks like build and test and so on to solr and trunk).The folder simply uses svn:external SVN props to link current solr and lucene trunk as subfolders. So developers that want to work on both can simply checkout this folder and SVN will resolve the externals. As this is trunk development, the externals will be without rev numbers and relative for the http(s) problem (SVN 1.5+ required). For testing flex, we create a branch of this folder, still pointing to solr-trunk, but flex branch in Lucene. One task of the main build.xml would be to copy all produced JAR files of Lucene into the correct build folder in Solr. I hope that you all understand me, but I am against merging trunks (for now) until we have a clear module decision. Uwe -