Re: Moving towards Lucene 4.0
On Thu, May 19, 2011 at 7:44 PM, Chris Hostetter wrote: > > : I think we should focus on everything that's *infrastructure* in 4.0, so > : that we can develop additional features in subsequent 4.x releases. If we > : end up releasing 4.0 just to discover many things will need to wait to 5.0, > : it'll be a big loss. > > the catch with that approach (i'm speaking generally here, not with any of > these particular lucene examples in mind) is that it's hard to know that > the infrastructure really makes sense until you've built a bunch of stuff > on it -- i think Josh Bloch has a paper where he says that you shouldn't > publish an API abstraction until you've built at least 3 *real* > (ie: not just toy or example) implementations of that API. yeah big +1 - everybody should watch that tech talk... ( http://www.youtube.com/watch?v=aAb7hSCtvGw ) > > it would be really easy to say "the infrastructure for X, Y, and Z is all > in 4.0, features that leverage this infra will start coming in 4.1" and > then discover on the way to 4.1 that we botched the APIs. > > what does this mean concretely for the specific "big ticket" changes that > we've got on trunk? ... i dunno, just my word of caution. > > : > we just started the discussion about Lucene 3.2 and releasing more > : > often. Yet, I think we should also start planning for Lucene 4.0 soon. > : > We have tons of stuff in trunk that people want to have and we can't > : > just keep on talking about it - we need to push this out to our users. > > I agree, but i think the other approach we should take is to be more > agressive about reviewing things that would be good candidates for > backporting. > > If we feel like some feature has a well defined API on trunk, and it's got > good tests, and people have been using it and filing bugs and helping to > make it better then we should consider it a candidate for backporting -- > if the merge itself looks like it would be a huge pain in hte ass we don't > *have* to backport, but we should at least look. I agree, we should backport what we can but we have to ensure some balance between amount of work vs. benefit. I mean one big thing which we could port is DWPT almost all the other features rely on the new flex API. So I am not sure if there is anything else really well DocValues could be easy actually. I still want to remind that we should not wait for too long with 4.0! simon > > That may not help for any of the "big ticket" infra changes discussed in > this thread (where we know it really needs to wait for a major release) > but it would definitely help with the "get features out to users faster" > issue. > > > > -Hoss > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving towards Lucene 4.0
On Thu, May 19, 2011 at 21:44, Chris Hostetter wrote: > > : I think we should focus on everything that's *infrastructure* in 4.0, so > : that we can develop additional features in subsequent 4.x releases. If we > : end up releasing 4.0 just to discover many things will need to wait to 5.0, > : it'll be a big loss. > > the catch with that approach (i'm speaking generally here, not with any of > these particular lucene examples in mind) is that it's hard to know that > the infrastructure really makes sense until you've built a bunch of stuff > on it -- i think Josh Bloch has a paper where he says that you shouldn't > publish an API abstraction until you've built at least 3 *real* > (ie: not just toy or example) implementations of that API. > > it would be really easy to say "the infrastructure for X, Y, and Z is all > in 4.0, features that leverage this infra will start coming in 4.1" and > then discover on the way to 4.1 that we botched the APIs. How do I express my profound love for these words, while remaining chaste? : ) > what does this mean concretely for the specific "big ticket" changes that > we've got on trunk? ... i dunno, just my word of caution. > > : > we just started the discussion about Lucene 3.2 and releasing more > : > often. Yet, I think we should also start planning for Lucene 4.0 soon. > : > We have tons of stuff in trunk that people want to have and we can't > : > just keep on talking about it - we need to push this out to our users. > > I agree, but i think the other approach we should take is to be more > agressive about reviewing things that would be good candidates for > backporting. > > If we feel like some feature has a well defined API on trunk, and it's got > good tests, and people have been using it and filing bugs and helping to > make it better then we should consider it a candidate for backporting -- > if the merge itself looks like it would be a huge pain in hte ass we don't > *have* to backport, but we should at least look. > > That may not help for any of the "big ticket" infra changes discussed in > this thread (where we know it really needs to wait for a major release) > but it would definitely help with the "get features out to users faster" > issue. > > > > -Hoss > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving towards Lucene 4.0
: I think we should focus on everything that's *infrastructure* in 4.0, so : that we can develop additional features in subsequent 4.x releases. If we : end up releasing 4.0 just to discover many things will need to wait to 5.0, : it'll be a big loss. the catch with that approach (i'm speaking generally here, not with any of these particular lucene examples in mind) is that it's hard to know that the infrastructure really makes sense until you've built a bunch of stuff on it -- i think Josh Bloch has a paper where he says that you shouldn't publish an API abstraction until you've built at least 3 *real* (ie: not just toy or example) implementations of that API. it would be really easy to say "the infrastructure for X, Y, and Z is all in 4.0, features that leverage this infra will start coming in 4.1" and then discover on the way to 4.1 that we botched the APIs. what does this mean concretely for the specific "big ticket" changes that we've got on trunk? ... i dunno, just my word of caution. : > we just started the discussion about Lucene 3.2 and releasing more : > often. Yet, I think we should also start planning for Lucene 4.0 soon. : > We have tons of stuff in trunk that people want to have and we can't : > just keep on talking about it - we need to push this out to our users. I agree, but i think the other approach we should take is to be more agressive about reviewing things that would be good candidates for backporting. If we feel like some feature has a well defined API on trunk, and it's got good tests, and people have been using it and filing bugs and helping to make it better then we should consider it a candidate for backporting -- if the merge itself looks like it would be a huge pain in hte ass we don't *have* to backport, but we should at least look. That may not help for any of the "big ticket" infra changes discussed in this thread (where we know it really needs to wait for a major release) but it would definitely help with the "get features out to users faster" issue. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving towards Lucene 4.0
On Mon, May 16, 2011 at 5:24 PM, Shai Erera wrote: > We anyway seem to mark every new API as @lucene.experimental these days, so > we shouldn't have too much problem when 4.0 is out :). > > Experimental API is subject to change at any time. We can consider that as > an option as well (maybe it adds another option to Robert's?). > > Though personally, I'm not a big fan of this notion - I think we deceive > ourselves and users when we have @experimental on a "stable" branch. Any > @experimental API on trunk today falls into this bucket after 4.0 is out. > And I'm sure there are a couple in 3.x already. > > Don't get me wrong - I don't suggest we should stop using it. But I think we > should consider to review the @experimental API before every "stable" > release, and reduce it over time, not increase it. +1 > > Shai > > On Mon, May 16, 2011 at 4:20 PM, Robert Muir wrote: >> >> On Mon, May 16, 2011 at 9:12 AM, Simon Willnauer >> wrote: >> > I have to admit that branch is very rough and the API is super hard to >> > use. For now! >> > Lets not be dragged away into discussion how this API should look like >> > there will be time >> > for that. >> >> +1, this is what i really meant by "decide how to handle". I don't >> think we will be able to quickly "decide how to fix" the branch >> itself, i think its really complicated. But we can admit its really >> complicated and won't be solved very soon, and try to figure out a >> release strategy with this in mind. >> >> (p.s. sorry simon, you got two copies of this message i accidentally >> hit reply instead of reply-all) >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving towards Lucene 4.0
We anyway seem to mark every new API as @lucene.experimental these days, so we shouldn't have too much problem when 4.0 is out :). Experimental API is subject to change at any time. We can consider that as an option as well (maybe it adds another option to Robert's?). Though personally, I'm not a big fan of this notion - I think we deceive ourselves and users when we have @experimental on a "stable" branch. Any @experimental API on trunk today falls into this bucket after 4.0 is out. And I'm sure there are a couple in 3.x already. Don't get me wrong - I don't suggest we should stop using it. But I think we should consider to review the @experimental API before every "stable" release, and reduce it over time, not increase it. Shai On Mon, May 16, 2011 at 4:20 PM, Robert Muir wrote: > On Mon, May 16, 2011 at 9:12 AM, Simon Willnauer > wrote: > > I have to admit that branch is very rough and the API is super hard to > > use. For now! > > Lets not be dragged away into discussion how this API should look like > > there will be time > > for that. > > +1, this is what i really meant by "decide how to handle". I don't > think we will be able to quickly "decide how to fix" the branch > itself, i think its really complicated. But we can admit its really > complicated and won't be solved very soon, and try to figure out a > release strategy with this in mind. > > (p.s. sorry simon, you got two copies of this message i accidentally > hit reply instead of reply-all) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: Moving towards Lucene 4.0
On Mon, May 16, 2011 at 9:12 AM, Simon Willnauer wrote: > I have to admit that branch is very rough and the API is super hard to > use. For now! > Lets not be dragged away into discussion how this API should look like > there will be time > for that. +1, this is what i really meant by "decide how to handle". I don't think we will be able to quickly "decide how to fix" the branch itself, i think its really complicated. But we can admit its really complicated and won't be solved very soon, and try to figure out a release strategy with this in mind. (p.s. sorry simon, you got two copies of this message i accidentally hit reply instead of reply-all) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving towards Lucene 4.0
On Mon, May 16, 2011 at 2:57 PM, Robert Muir wrote: > On Mon, May 16, 2011 at 8:48 AM, Uwe Schindler wrote: >> Sorry to be negative, >> >>> - BulkPostings (my +1 since I want to enable positional scoring on all >>> queries) >> >> My problem is the really crappy and unusable API of BulkPostings (wait for >> my talk at Lucene Rev...). For anybody else than Mike, Yonik and yourself >> that’s unusable. I tried to understand even the simple >> MultiTermQueryWrapperFilter - easy on trunk, horrible on branch - sorry >> that’s a no-go. >> >> Its code duplication everywhere and unreadable. >> > > I don't think you should apologize for being negative, its true there > is a ton of work to do here before that branch is "ready". Thats why > in my email I tried to brainstorm some alternative ways we could get > some of these features into the hands of users without being held up > by this work. > I have to admit that branch is very rough and the API is super hard to use. For now! Lets not be dragged away into discussion how this API should look like there will be time for that. I agree with robert that I see a large amount of work left on that branch though so maybe we should move the positional scoring (LUCENE-2878) over to trunk as another option. I think we should not wait much longer with Lucene 4.0 so I lean towards Roberts option 2 even if we need to pay the price for a major change in 5.0. I am not sure if we really need to change much API for Realtime Search since this should be hidden in IW, IndexingChain and IW#getReader() - I kind of like the idea to be close to 4.0 :) simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving towards Lucene 4.0
On Mon, May 16, 2011 at 8:48 AM, Uwe Schindler wrote: > Sorry to be negative, > >> - BulkPostings (my +1 since I want to enable positional scoring on all >> queries) > > My problem is the really crappy and unusable API of BulkPostings (wait for my > talk at Lucene Rev...). For anybody else than Mike, Yonik and yourself that’s > unusable. I tried to understand even the simple MultiTermQueryWrapperFilter - > easy on trunk, horrible on branch - sorry that’s a no-go. > > Its code duplication everywhere and unreadable. > I don't think you should apologize for being negative, its true there is a ton of work to do here before that branch is "ready". Thats why in my email I tried to brainstorm some alternative ways we could get some of these features into the hands of users without being held up by this work. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Moving towards Lucene 4.0
Sorry to be negative, > - BulkPostings (my +1 since I want to enable positional scoring on all > queries) My problem is the really crappy and unusable API of BulkPostings (wait for my talk at Lucene Rev...). For anybody else than Mike, Yonik and yourself that’s unusable. I tried to understand even the simple MultiTermQueryWrapperFilter - easy on trunk, horrible on branch - sorry that’s a no-go. Its code duplication everywhere and unreadable. Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving towards Lucene 4.0
On Mon, May 16, 2011 at 7:52 AM, Simon Willnauer wrote: > Hey folks, > > we just started the discussion about Lucene 3.2 and releasing more > often. Yet, I think we should also start planning for Lucene 4.0 soon. > We have tons of stuff in trunk that people want to have and we can't > just keep on talking about it - we need to push this out to our users. > From my perspective we should decide on at least the big outstanding > issues like: > > - BulkPostings (my +1 since I want to enable positional scoring on all > queries) in my own opinion, this is probably the most important to decide how to handle. I think it might not be good if we introduce a new major version branch (4.x) with flexible indexing if the postings APIs limit us from actually taking advantage of it. I think that we should look at (shai brought up a previous thread about this) when 4.x is released, 3.x goes into bugfix mode and we open up 5.x. So, we want to make sure we actually have things stable enough (from an API and flexibility perspective) that we will be able to get some life out of the 4.x series and add new features to it. I think there is a lot left to do with bulkpostings and its going to require a lot of work, but at the same time I really don't like that we have serious improvements/features in trunk (some have been there now for years) still unreleased and not yet available to users. Some other crazy ideas (just for discussion): * we could try to be more aggressive about backporting and getting more "life" out of 3.x, and getting some of these features to users. For example, perhaps things like DWPT, DocValues, more efficient terms index, automaton, etc could be backported safely. the advantage here is that we get the features to the users, but the disadvantage is it would be a lot of effort backporting. * we could decide that we do actually have enough flexibility now in 4.x to get several releases out of it (e.g. containing features like docvalues, realtime search, etc), even though we know its limited to some extent, and defer api-breakers like bulkpostings/flexscoring to 5.x. the advantage here is that we could start looking at 4.x releasing very soon, but there are some disadvantages, like forcing people have to change a lot of their code to upgrade but for less "gain", and potentially limiting ourselves in the 4.x branch by its APIs. * we could do nothing at all, and keep going like we are going now, deciding that we are actually getting enough useful features into 3.x releases that its ok for us to block 4.0 on some of these tougher issues like bulkpostings. The disadvantage is of course even longer wait time for the features that have been sitting in trunk a while, but it keeps 3.x stable and is less work for us. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving towards Lucene 4.0
+1 Mike http://blog.mikemccandless.com On Mon, May 16, 2011 at 7:52 AM, Simon Willnauer wrote: > Hey folks, > > we just started the discussion about Lucene 3.2 and releasing more > often. Yet, I think we should also start planning for Lucene 4.0 soon. > We have tons of stuff in trunk that people want to have and we can't > just keep on talking about it - we need to push this out to our users. > From my perspective we should decide on at least the big outstanding > issues like: > > - BulkPostings (my +1 since I want to enable positional scoring on all > queries) > - DocValues (pretty close) > - FlexibleScoring (+- 0 I think we should wait how gsoc turns out and > decide then?) > - Codec Support for Stored Fields, Norms & TV (not sure about that but > seems doable at least an API and current impl as default) > - Realtime Search aka. Searchable Ram Buffer (this seems quite far > though while I would love to have it it seems we need to push this to >> 4.0) > > For DocValues the decision seems easy since we are very close with > that and I expect it to land until end of June. I want to kick off the > discussion here so nothing will be set to stone really but I think we > should plan to release somewhere near the end of the year?! > > > simon > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving towards Lucene 4.0
> > I think we should also start planning for Lucene 4.0 soon. > +1 ! I think we should focus on everything that's *infrastructure* in 4.0, so that we can develop additional features in subsequent 4.x releases. If we end up releasing 4.0 just to discover many things will need to wait to 5.0, it'll be a big loss. So Codecs seem like *infra* to me, and can we make sure the necessary API is in place for RT Search and stuff? I think a lot of the new API in 4.0 is @lucene.experimental anyway? In short, if we have enough API support in 4.0 already, we can release it and develop features in 4.x releases. The only thing we should 'push' is stuff that requires API serious changes (I doubt there are many like that, maybe just Codecs support for the stuff you mentioned). Shai On Mon, May 16, 2011 at 2:52 PM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > Hey folks, > > we just started the discussion about Lucene 3.2 and releasing more > often. Yet, I think we should also start planning for Lucene 4.0 soon. > We have tons of stuff in trunk that people want to have and we can't > just keep on talking about it - we need to push this out to our users. > From my perspective we should decide on at least the big outstanding > issues like: > > - BulkPostings (my +1 since I want to enable positional scoring on all > queries) > - DocValues (pretty close) > - FlexibleScoring (+- 0 I think we should wait how gsoc turns out and > decide then?) > - Codec Support for Stored Fields, Norms & TV (not sure about that but > seems doable at least an API and current impl as default) > - Realtime Search aka. Searchable Ram Buffer (this seems quite far > though while I would love to have it it seems we need to push this to > > 4.0) > > For DocValues the decision seems easy since we are very close with > that and I expect it to land until end of June. I want to kick off the > discussion here so nothing will be set to stone really but I think we > should plan to release somewhere near the end of the year?! > > > simon > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Moving towards Lucene 4.0
Hey folks, we just started the discussion about Lucene 3.2 and releasing more often. Yet, I think we should also start planning for Lucene 4.0 soon. We have tons of stuff in trunk that people want to have and we can't just keep on talking about it - we need to push this out to our users. >From my perspective we should decide on at least the big outstanding issues like: - BulkPostings (my +1 since I want to enable positional scoring on all queries) - DocValues (pretty close) - FlexibleScoring (+- 0 I think we should wait how gsoc turns out and decide then?) - Codec Support for Stored Fields, Norms & TV (not sure about that but seems doable at least an API and current impl as default) - Realtime Search aka. Searchable Ram Buffer (this seems quite far though while I would love to have it it seems we need to push this to > 4.0) For DocValues the decision seems easy since we are very close with that and I expect it to land until end of June. I want to kick off the discussion here so nothing will be set to stone really but I think we should plan to release somewhere near the end of the year?! simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org