Re: Moving towards Lucene 4.0

2011-05-19 Thread Simon Willnauer
On Thu, May 19, 2011 at 7:44 PM, Chris Hostetter
 wrote:
>
> : I think we should focus on everything that's *infrastructure* in 4.0, so
> : that we can develop additional features in subsequent 4.x releases. If we
> : end up releasing 4.0 just to discover many things will need to wait to 5.0,
> : it'll be a big loss.
>
> the catch with that approach (i'm speaking generally here, not with any of
> these particular lucene examples in mind) is that it's hard to know that
> the infrastructure really makes sense until you've built a bunch of stuff
> on it -- i think Josh Bloch has a paper where he says that you shouldn't
> publish an API abstraction until you've built at least 3 *real*
> (ie: not just toy or example) implementations of that API.

yeah big +1 - everybody should watch that tech talk... (
http://www.youtube.com/watch?v=aAb7hSCtvGw )
>
> it would be really easy to say "the infrastructure for X, Y, and Z is all
> in 4.0, features that leverage this infra will start coming in 4.1" and
> then discover on the way to 4.1 that we botched the APIs.
>
> what does this mean concretely for the specific "big ticket" changes that
> we've got on trunk? ... i dunno, just my word of caution.
>
> : > we just started the discussion about Lucene 3.2 and releasing more
> : > often. Yet, I think we should also start planning for Lucene 4.0 soon.
> : > We have tons of stuff in trunk that people want to have and we can't
> : > just keep on talking about it - we need to push this out to our users.
>
> I agree, but i think the other approach we should take is to be more
> agressive about reviewing things that would be good candidates for
> backporting.
>
> If we feel like some feature has a well defined API on trunk, and it's got
> good tests, and people have been using it and filing bugs and helping to
> make it better then we should consider it a candidate for backporting --
> if the merge itself looks like it would be a huge pain in hte ass we don't
> *have* to backport, but we should at least look.

I agree, we should backport what we can but we have to ensure some
balance between
amount of work vs. benefit. I mean one big thing which we could port
is DWPT almost all the other features
rely on the new flex API. So I am not sure if there is anything else
really well DocValues could be easy actually.

I still want to remind that we should not wait for too long with 4.0!

simon
>
> That may not help for any of the "big ticket" infra changes discussed in
> this thread (where we know it really needs to wait for a major release)
> but it would definitely help with the "get features out to users faster"
> issue.
>
>
>
> -Hoss
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving towards Lucene 4.0

2011-05-19 Thread Earwin Burrfoot
On Thu, May 19, 2011 at 21:44, Chris Hostetter  wrote:
>
> : I think we should focus on everything that's *infrastructure* in 4.0, so
> : that we can develop additional features in subsequent 4.x releases. If we
> : end up releasing 4.0 just to discover many things will need to wait to 5.0,
> : it'll be a big loss.
>
> the catch with that approach (i'm speaking generally here, not with any of
> these particular lucene examples in mind) is that it's hard to know that
> the infrastructure really makes sense until you've built a bunch of stuff
> on it -- i think Josh Bloch has a paper where he says that you shouldn't
> publish an API abstraction until you've built at least 3 *real*
> (ie: not just toy or example) implementations of that API.
>
> it would be really easy to say "the infrastructure for X, Y, and Z is all
> in 4.0, features that leverage this infra will start coming in 4.1" and
> then discover on the way to 4.1 that we botched the APIs.

How do I express my profound love for these words, while remaining chaste? : )

> what does this mean concretely for the specific "big ticket" changes that
> we've got on trunk? ... i dunno, just my word of caution.
>
> : > we just started the discussion about Lucene 3.2 and releasing more
> : > often. Yet, I think we should also start planning for Lucene 4.0 soon.
> : > We have tons of stuff in trunk that people want to have and we can't
> : > just keep on talking about it - we need to push this out to our users.
>
> I agree, but i think the other approach we should take is to be more
> agressive about reviewing things that would be good candidates for
> backporting.
>
> If we feel like some feature has a well defined API on trunk, and it's got
> good tests, and people have been using it and filing bugs and helping to
> make it better then we should consider it a candidate for backporting --
> if the merge itself looks like it would be a huge pain in hte ass we don't
> *have* to backport, but we should at least look.
>
> That may not help for any of the "big ticket" infra changes discussed in
> this thread (where we know it really needs to wait for a major release)
> but it would definitely help with the "get features out to users faster"
> issue.
>
>
>
> -Hoss
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving towards Lucene 4.0

2011-05-19 Thread Chris Hostetter

: I think we should focus on everything that's *infrastructure* in 4.0, so
: that we can develop additional features in subsequent 4.x releases. If we
: end up releasing 4.0 just to discover many things will need to wait to 5.0,
: it'll be a big loss.

the catch with that approach (i'm speaking generally here, not with any of 
these particular lucene examples in mind) is that it's hard to know that 
the infrastructure really makes sense until you've built a bunch of stuff 
on it -- i think Josh Bloch has a paper where he says that you shouldn't 
publish an API abstraction until you've built at least 3 *real* 
(ie: not just toy or example) implementations of that API.

it would be really easy to say "the infrastructure for X, Y, and Z is all 
in 4.0, features that leverage this infra will start coming in 4.1" and 
then discover on the way to 4.1 that we botched the APIs.

what does this mean concretely for the specific "big ticket" changes that 
we've got on trunk? ... i dunno, just my word of caution.

: > we just started the discussion about Lucene 3.2 and releasing more
: > often. Yet, I think we should also start planning for Lucene 4.0 soon.
: > We have tons of stuff in trunk that people want to have and we can't
: > just keep on talking about it - we need to push this out to our users.

I agree, but i think the other approach we should take is to be more 
agressive about reviewing things that would be good candidates for 
backporting.

If we feel like some feature has a well defined API on trunk, and it's got 
good tests, and people have been using it and filing bugs and helping to 
make it better then we should consider it a candidate for backporting -- 
if the merge itself looks like it would be a huge pain in hte ass we don't 
*have* to backport, but we should at least look.

That may not help for any of the "big ticket" infra changes discussed in 
this thread (where we know it really needs to wait for a major release)
but it would definitely help with the "get features out to users faster" 
issue.



-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving towards Lucene 4.0

2011-05-17 Thread Simon Willnauer
On Mon, May 16, 2011 at 5:24 PM, Shai Erera  wrote:
> We anyway seem to mark every new API as @lucene.experimental these days, so
> we shouldn't have too much problem when 4.0 is out :).
>
> Experimental API is subject to change at any time. We can consider that as
> an option as well (maybe it adds another option to Robert's?).
>
> Though personally, I'm not a big fan of this notion - I think we deceive
> ourselves and users when we have @experimental on a "stable" branch. Any
> @experimental API on trunk today falls into this bucket after 4.0 is out.
> And I'm sure there are a couple in 3.x already.
>
> Don't get me wrong - I don't suggest we should stop using it. But I think we
> should consider to review the @experimental API before every "stable"
> release, and reduce it over time, not increase it.

+1
>
> Shai
>
> On Mon, May 16, 2011 at 4:20 PM, Robert Muir  wrote:
>>
>> On Mon, May 16, 2011 at 9:12 AM, Simon Willnauer
>>  wrote:
>> > I have to admit that branch is very rough and the API is super hard to
>> > use. For now!
>> > Lets not be dragged away into discussion how this API should look like
>> > there will be time
>> > for that.
>>
>> +1, this is what i really meant by "decide how to handle". I don't
>> think we will be able to quickly "decide how to fix" the branch
>> itself, i think its really complicated. But we can admit its really
>> complicated and won't be solved very soon, and try to figure out a
>> release strategy with this in mind.
>>
>> (p.s. sorry simon, you got two copies of this message i accidentally
>> hit reply instead of reply-all)
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving towards Lucene 4.0

2011-05-16 Thread Shai Erera
We anyway seem to mark every new API as @lucene.experimental these days, so
we shouldn't have too much problem when 4.0 is out :).

Experimental API is subject to change at any time. We can consider that as
an option as well (maybe it adds another option to Robert's?).

Though personally, I'm not a big fan of this notion - I think we deceive
ourselves and users when we have @experimental on a "stable" branch. Any
@experimental API on trunk today falls into this bucket after 4.0 is out.
And I'm sure there are a couple in 3.x already.

Don't get me wrong - I don't suggest we should stop using it. But I think we
should consider to review the @experimental API before every "stable"
release, and reduce it over time, not increase it.

Shai

On Mon, May 16, 2011 at 4:20 PM, Robert Muir  wrote:

> On Mon, May 16, 2011 at 9:12 AM, Simon Willnauer
>  wrote:
> > I have to admit that branch is very rough and the API is super hard to
> > use. For now!
> > Lets not be dragged away into discussion how this API should look like
> > there will be time
> > for that.
>
> +1, this is what i really meant by "decide how to handle". I don't
> think we will be able to quickly "decide how to fix" the branch
> itself, i think its really complicated. But we can admit its really
> complicated and won't be solved very soon, and try to figure out a
> release strategy with this in mind.
>
> (p.s. sorry simon, you got two copies of this message i accidentally
> hit reply instead of reply-all)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Moving towards Lucene 4.0

2011-05-16 Thread Robert Muir
On Mon, May 16, 2011 at 9:12 AM, Simon Willnauer
 wrote:
> I have to admit that branch is very rough and the API is super hard to
> use. For now!
> Lets not be dragged away into discussion how this API should look like
> there will be time
> for that.

+1, this is what i really meant by "decide how to handle". I don't
think we will be able to quickly "decide how to fix" the branch
itself, i think its really complicated. But we can admit its really
complicated and won't be solved very soon, and try to figure out a
release strategy with this in mind.

(p.s. sorry simon, you got two copies of this message i accidentally
hit reply instead of reply-all)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving towards Lucene 4.0

2011-05-16 Thread Simon Willnauer
On Mon, May 16, 2011 at 2:57 PM, Robert Muir  wrote:
> On Mon, May 16, 2011 at 8:48 AM, Uwe Schindler  wrote:
>> Sorry to be negative,
>>
>>> - BulkPostings (my +1 since I want to enable positional scoring on all 
>>> queries)
>>
>> My problem is the really crappy and unusable API of BulkPostings (wait for 
>> my talk at Lucene Rev...). For anybody else than Mike, Yonik and yourself 
>> that’s unusable. I tried to understand even the simple 
>> MultiTermQueryWrapperFilter - easy on trunk, horrible on branch - sorry 
>> that’s a no-go.
>>
>> Its code duplication everywhere and unreadable.
>>
>
> I don't think you should apologize for being negative, its true there
> is a ton of work to do here before that branch is "ready". Thats why
> in my email I tried to brainstorm some alternative ways we could get
> some of these features into the hands of users without being held up
> by this work.
>

I have to admit that branch is very rough and the API is super hard to
use. For now!
Lets not be dragged away into discussion how this API should look like
there will be time
for that. I agree with robert that I see a large amount of work left
on that branch though so maybe
we should move the positional scoring (LUCENE-2878) over to trunk as
another option.

I think we should not wait much longer with Lucene 4.0 so I lean
towards Roberts option 2 even if we need to pay the price
for a major change in 5.0. I am not sure if we really need to change
much API for Realtime Search since this should be hidden in IW,
IndexingChain and IW#getReader() - I kind of like the idea to be close
to 4.0 :)

simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving towards Lucene 4.0

2011-05-16 Thread Robert Muir
On Mon, May 16, 2011 at 8:48 AM, Uwe Schindler  wrote:
> Sorry to be negative,
>
>> - BulkPostings (my +1 since I want to enable positional scoring on all 
>> queries)
>
> My problem is the really crappy and unusable API of BulkPostings (wait for my 
> talk at Lucene Rev...). For anybody else than Mike, Yonik and yourself that’s 
> unusable. I tried to understand even the simple MultiTermQueryWrapperFilter - 
> easy on trunk, horrible on branch - sorry that’s a no-go.
>
> Its code duplication everywhere and unreadable.
>

I don't think you should apologize for being negative, its true there
is a ton of work to do here before that branch is "ready". Thats why
in my email I tried to brainstorm some alternative ways we could get
some of these features into the hands of users without being held up
by this work.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Moving towards Lucene 4.0

2011-05-16 Thread Uwe Schindler
Sorry to be negative,

> - BulkPostings (my +1 since I want to enable positional scoring on all 
> queries)

My problem is the really crappy and unusable API of BulkPostings (wait for my 
talk at Lucene Rev...). For anybody else than Mike, Yonik and yourself that’s 
unusable. I tried to understand even the simple MultiTermQueryWrapperFilter - 
easy on trunk, horrible on branch - sorry that’s a no-go.

Its code duplication everywhere and unreadable.

Uwe


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving towards Lucene 4.0

2011-05-16 Thread Robert Muir
On Mon, May 16, 2011 at 7:52 AM, Simon Willnauer
 wrote:
> Hey folks,
>
> we just started the discussion about Lucene 3.2 and releasing more
> often. Yet, I think we should also start planning for Lucene 4.0 soon.
> We have tons of stuff in trunk that people want to have and we can't
> just keep on talking about it - we need to push this out to our users.
> From my perspective we should decide on at least the big outstanding
> issues like:
>
> - BulkPostings (my +1 since I want to enable positional scoring on all 
> queries)

in my own opinion, this is probably the most important to decide how
to handle. I think it might not be good if we introduce a new major
version branch (4.x) with flexible indexing if the postings APIs limit
us from actually taking advantage of it.
I think that we should look at (shai brought up a previous thread
about this) when 4.x is released, 3.x goes into bugfix mode and we
open up 5.x. So, we want to make sure we actually have things stable
enough (from an API and flexibility perspective) that we will be able
to get some life out of the 4.x series and add new features to it.

I think there is a lot left to do with bulkpostings and its going to
require a lot of work, but at the same time I really don't like that
we have serious improvements/features in trunk (some have been there
now for years) still unreleased and not yet available to users.

Some other crazy ideas (just for discussion):
* we could try to be more aggressive about backporting and getting
more "life" out of 3.x, and getting some of these features to users.
For example, perhaps things like DWPT, DocValues, more efficient terms
index, automaton, etc could be backported safely. the advantage here
is that we get the features to the users, but the disadvantage is it
would be a lot of effort backporting.
* we could decide that we do actually have enough flexibility now in
4.x to get several releases out of it (e.g. containing features like
docvalues, realtime search, etc), even though we know its limited to
some extent, and defer api-breakers like bulkpostings/flexscoring to
5.x. the advantage here is that we could start looking at 4.x
releasing very soon, but there are some disadvantages, like forcing
people have to change a lot of their code to upgrade but for less
"gain", and potentially limiting ourselves in the 4.x branch by its
APIs.
* we could do nothing at all, and keep going like we are going now,
deciding that we are actually getting enough useful features into 3.x
releases that its ok for us to block 4.0 on some of these tougher
issues like bulkpostings. The disadvantage is of course even longer
wait time for the features that have been sitting in trunk a while,
but it keeps 3.x stable and is less work for us.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving towards Lucene 4.0

2011-05-16 Thread Michael McCandless
+1

Mike

http://blog.mikemccandless.com

On Mon, May 16, 2011 at 7:52 AM, Simon Willnauer
 wrote:
> Hey folks,
>
> we just started the discussion about Lucene 3.2 and releasing more
> often. Yet, I think we should also start planning for Lucene 4.0 soon.
> We have tons of stuff in trunk that people want to have and we can't
> just keep on talking about it - we need to push this out to our users.
> From my perspective we should decide on at least the big outstanding
> issues like:
>
> - BulkPostings (my +1 since I want to enable positional scoring on all 
> queries)
> - DocValues (pretty close)
> - FlexibleScoring (+- 0 I think we should wait how gsoc turns out and
> decide then?)
> - Codec Support for Stored Fields, Norms & TV (not sure about that but
> seems doable at least an API and current impl as default)
> - Realtime Search aka. Searchable Ram Buffer (this seems quite far
> though while I would love to have it it seems we need to push this to
>> 4.0)
>
> For DocValues the decision seems easy since we are very close with
> that and I expect it to land until end of June. I want to kick off the
> discussion here so nothing will be set to stone really but I think we
> should plan to release somewhere near the end of the year?!
>
>
> simon
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving towards Lucene 4.0

2011-05-16 Thread Shai Erera
>
> I think we should also start planning for Lucene 4.0 soon.
>

+1 !

I think we should focus on everything that's *infrastructure* in 4.0, so
that we can develop additional features in subsequent 4.x releases. If we
end up releasing 4.0 just to discover many things will need to wait to 5.0,
it'll be a big loss.

So Codecs seem like *infra* to me, and can we make sure the necessary API is
in place for RT Search and stuff? I think a lot of the new API in 4.0 is
@lucene.experimental anyway?

In short, if we have enough API support in 4.0 already, we can release it
and develop features in 4.x releases. The only thing we should 'push' is
stuff that requires API serious changes (I doubt there are many like that,
maybe just Codecs support for the stuff you mentioned).

Shai

On Mon, May 16, 2011 at 2:52 PM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:

> Hey folks,
>
> we just started the discussion about Lucene 3.2 and releasing more
> often. Yet, I think we should also start planning for Lucene 4.0 soon.
> We have tons of stuff in trunk that people want to have and we can't
> just keep on talking about it - we need to push this out to our users.
> From my perspective we should decide on at least the big outstanding
> issues like:
>
> - BulkPostings (my +1 since I want to enable positional scoring on all
> queries)
> - DocValues (pretty close)
> - FlexibleScoring (+- 0 I think we should wait how gsoc turns out and
> decide then?)
> - Codec Support for Stored Fields, Norms & TV (not sure about that but
> seems doable at least an API and current impl as default)
> - Realtime Search aka. Searchable Ram Buffer (this seems quite far
> though while I would love to have it it seems we need to push this to
> > 4.0)
>
> For DocValues the decision seems easy since we are very close with
> that and I expect it to land until end of June. I want to kick off the
> discussion here so nothing will be set to stone really but I think we
> should plan to release somewhere near the end of the year?!
>
>
> simon
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Moving towards Lucene 4.0

2011-05-16 Thread Simon Willnauer
Hey folks,

we just started the discussion about Lucene 3.2 and releasing more
often. Yet, I think we should also start planning for Lucene 4.0 soon.
We have tons of stuff in trunk that people want to have and we can't
just keep on talking about it - we need to push this out to our users.
>From my perspective we should decide on at least the big outstanding
issues like:

- BulkPostings (my +1 since I want to enable positional scoring on all queries)
- DocValues (pretty close)
- FlexibleScoring (+- 0 I think we should wait how gsoc turns out and
decide then?)
- Codec Support for Stored Fields, Norms & TV (not sure about that but
seems doable at least an API and current impl as default)
- Realtime Search aka. Searchable Ram Buffer (this seems quite far
though while I would love to have it it seems we need to push this to
> 4.0)

For DocValues the decision seems easy since we are very close with
that and I expect it to land until end of June. I want to kick off the
discussion here so nothing will be set to stone really but I think we
should plan to release somewhere near the end of the year?!


simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org