[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846261#action_12846261
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

I think the easiest way to test out the concurrency is to add a
flush method to ByteBlockPool. Then allocate a read only version
of the buffers array (not copying the byte arrays, just the 1st
dimension pointers). The only issue is to rework the code to
read from the read only array, and write to the write only
array... 

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846220#action_12846220
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

Michael, Agreed, can you outline how you think we should proceed then?

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals

2010-03-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846189#action_12846189
 ] 

Uwe Schindler edited comment on LUCENE-2326 at 3/16/10 11:17 PM:
-

Here the patch, before applying do the following (in main checkout folder):

{noformat}
ant clean-backwards
svn mkdir ./backwards
svn cp 
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src
 backwards/src
svn propset svn:externals "data -r500 
svn://svn.tartarus.org/snowball/trunk/data" 
contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
svn propdel svn:ignore 
contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
{noformat}

Then apply patch and run svn up.

  was (Author: thetaphi):
Here the patch, before applying do the following (in main checkout folder):

{noformat}
ant clean-backwards
svn mkdir ./backwards
svn cp 
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src
 backwards/src
svn propset svn:externals "-r500 svn://svn.tartarus.org/snowball/trunk/data 
data" contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
svn propdel svn:ignore 
contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
{noformat}

Then apply patch and run svn up.
  
> Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards 
> branch and linking snowball tests by svn:externals
> ---
>
> Key: LUCENE-2326
> URL: https://issues.apache.org/jira/browse/LUCENE-2326
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: Flex Branch, 3.1
>
> Attachments: LUCENE-2326.patch
>
>
> As we often need to update backwards tests together with trunk and always 
> have to update the branch first, record rev no, and update build xml, I would 
> simply like to do a svn copy/move of the backwards branch.
> After a release, this is simply also done:
> {code}
> svn rm backwards
> svn cp releasebranch backwards
> {code}
> By this we can simply commit in one pass, create patches in one pass.
> The snowball tests are currently downloaded by svn.exe, too. These need a 
> fixed version for checkout. I would like to change this to use svn:externals. 
> Will provide patch, soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals

2010-03-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846198#action_12846198
 ] 

Uwe Schindler commented on LUCENE-2326:
---

I added one thing (as discussed with rmuir):
As the snowball test data is too much, i excluded it from the src jar. The test 
will not fail, but instead print a warning, that the data is missing. So the 
test will also pass, if e.g. hudson fails to checkout the external svn repo.

> Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards 
> branch and linking snowball tests by svn:externals
> ---
>
> Key: LUCENE-2326
> URL: https://issues.apache.org/jira/browse/LUCENE-2326
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: Flex Branch, 3.1
>
> Attachments: LUCENE-2326.patch
>
>
> As we often need to update backwards tests together with trunk and always 
> have to update the branch first, record rev no, and update build xml, I would 
> simply like to do a svn copy/move of the backwards branch.
> After a release, this is simply also done:
> {code}
> svn rm backwards
> svn cp releasebranch backwards
> {code}
> By this we can simply commit in one pass, create patches in one pass.
> The snowball tests are currently downloaded by svn.exe, too. These need a 
> fixed version for checkout. I would like to change this to use svn:externals. 
> Will provide patch, soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals

2010-03-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846189#action_12846189
 ] 

Uwe Schindler edited comment on LUCENE-2326 at 3/16/10 11:01 PM:
-

Here the patch, before applying do the following (in main checkout folder):

{noformat}
ant clean-backwards
svn mkdir ./backwards
svn cp 
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src
 backwards/src
svn propset svn:externals "-r500 svn://svn.tartarus.org/snowball/trunk/data 
data" contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
svn propdel svn:ignore 
contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
{noformat}

Then apply patch and run svn up.

  was (Author: thetaphi):
Here the patch, before applying do the following (in main checkout folder):

{noformat}
ant clean-backwards
svn mkdir ./backwards
svn cp 
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src
 .
svn propset svn:externals "-r500 svn://svn.tartarus.org/snowball/trunk/data 
data" contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
svn propdel svn:ignore 
contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
{noformat}

Then apply patch and run svn up.
  
> Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards 
> branch and linking snowball tests by svn:externals
> ---
>
> Key: LUCENE-2326
> URL: https://issues.apache.org/jira/browse/LUCENE-2326
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: Flex Branch, 3.1
>
> Attachments: LUCENE-2326.patch
>
>
> As we often need to update backwards tests together with trunk and always 
> have to update the branch first, record rev no, and update build xml, I would 
> simply like to do a svn copy/move of the backwards branch.
> After a release, this is simply also done:
> {code}
> svn rm backwards
> svn cp releasebranch backwards
> {code}
> By this we can simply commit in one pass, create patches in one pass.
> The snowball tests are currently downloaded by svn.exe, too. These need a 
> fixed version for checkout. I would like to change this to use svn:externals. 
> Will provide patch, soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
Duh -- I meant to reply to Hoss' proposal, below:

On Tue, Mar 16, 2010 at 5:55 PM, Michael McCandless
 wrote:
> +1
>
> I like this proposal!
>
> I agree we should not preclude the future (modules), let's just not
> hold up dev today until we solve it.
>
> I agree your side by side solution would allow for us to later factor
> up modules (eg analyzers).
>
> Mike
>
> On Tue, Mar 16, 2010 at 5:47 PM, Michael McCandless
>  wrote:
>> But it's actually the reverse?  Solr depends on Lucene but not vice/versa.
>>
>> (If instead I proposed making Solr a subdir of Lucene then I'd agree)
>>
>> So... if you checkout only lucene, you can cd there and do all you do
>> today with Lucene ("ant test", "ant dist", "svn diff", etc.).
>>
>> If you checkout solr, you can cd there and "ant test" will run all of
>> Lucene's and all of Solr's tests.  "svn diff" will include any changes
>> to lucene and to solr.
>>
>> Ie this achieves want we want -- Solr to depend on Lucene but not vice
>> versa, right?
>>
>> Mike
>>
>> On Tue, Mar 16, 2010 at 5:18 PM, Shai Erera  wrote:
>>> I have to agree w/ Jake that putting Lucene under Solr gives the impression
>>> as if suddenly Lucene became dependent on it ... and for really no good
>>> reasons. Are we making that decision to simplify the build of Solr? What are
>>> the problems Solr faces today w.r.t. its build and using a Lucene release or
>>> trunk revision?
>>>
>>> I didn't follow the Lucene/Solr merge on general@, because I didn't even
>>> know such a beast exists. So I guess I'm missing something ...
>>>
>>> Shai
>>>
>>> On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix  wrote:

 On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley  wrote:
>
> > Chiming in just a bit here - isn't there any concern that independent
> > of
> > whether or not people "can"
> > build lucene without checking out solr, the mere fact that Lucene will
> > be
> > effectively a "subdirectory"
> > of solr...  is there no concern that there will then be a perception
> > that Lucene is a subproject of
> > Solr, instead of vice-versa?
>
> Who would have this perception?
> Casual users will be using downloads.

 Developers and dev managers at companies doing build vs. buy decisions
 regarding
 whether they will do one of the following:
 1) pay big bucks to get FAST or whatever
 2) use Solr (free/cheap!)
 3) pay [variable] bucks to build their own with Lucene
 4) pay [variable but high] to build their own from scratch
 I'm not concerned with casual downloaders.  I'm talking about the
 companies and people who
 may or may not be interested in making multi-million dollar decisions
 regarding using or
 not using Lucene or Solr.
   -jake
>>>
>>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Michael Busch
What about tagging and branching?  When we cut a Lucene release we also 
tag Solr, even though it's not being released?


 Michael

On 3/16/10 3:47 PM, Michael McCandless wrote:

But it's actually the reverse?  Solr depends on Lucene but not vice/versa.

(If instead I proposed making Solr a subdir of Lucene then I'd agree)

So... if you checkout only lucene, you can cd there and do all you do
today with Lucene ("ant test", "ant dist", "svn diff", etc.).

If you checkout solr, you can cd there and "ant test" will run all of
Lucene's and all of Solr's tests.  "svn diff" will include any changes
to lucene and to solr.

Ie this achieves want we want -- Solr to depend on Lucene but not vice
versa, right?

Mike

On Tue, Mar 16, 2010 at 5:18 PM, Shai Erera  wrote:
   

I have to agree w/ Jake that putting Lucene under Solr gives the impression
as if suddenly Lucene became dependent on it ... and for really no good
reasons. Are we making that decision to simplify the build of Solr? What are
the problems Solr faces today w.r.t. its build and using a Lucene release or
trunk revision?

I didn't follow the Lucene/Solr merge on general@, because I didn't even
know such a beast exists. So I guess I'm missing something ...

Shai

On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix  wrote:
 

On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley  wrote:
   
 

Chiming in just a bit here - isn't there any concern that independent
of
whether or not people "can"
build lucene without checking out solr, the mere fact that Lucene will
be
effectively a "subdirectory"
of solr...  is there no concern that there will then be a perception
that Lucene is a subproject of
Solr, instead of vice-versa?
   

Who would have this perception?
Casual users will be using downloads.
 

Developers and dev managers at companies doing build vs. buy decisions
regarding
whether they will do one of the following:
1) pay big bucks to get FAST or whatever
2) use Solr (free/cheap!)
3) pay [variable] bucks to build their own with Lucene
4) pay [variable but high] to build their own from scratch
I'm not concerned with casual downloaders.  I'm talking about the
companies and people who
may or may not be interested in making multi-million dollar decisions
regarding using or
not using Lucene or Solr.
   -jake
   
 

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


   



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
+1

I like this proposal!

I agree we should not preclude the future (modules), let's just not
hold up dev today until we solve it.

I agree your side by side solution would allow for us to later factor
up modules (eg analyzers).

Mike

On Tue, Mar 16, 2010 at 5:47 PM, Michael McCandless
 wrote:
> But it's actually the reverse?  Solr depends on Lucene but not vice/versa.
>
> (If instead I proposed making Solr a subdir of Lucene then I'd agree)
>
> So... if you checkout only lucene, you can cd there and do all you do
> today with Lucene ("ant test", "ant dist", "svn diff", etc.).
>
> If you checkout solr, you can cd there and "ant test" will run all of
> Lucene's and all of Solr's tests.  "svn diff" will include any changes
> to lucene and to solr.
>
> Ie this achieves want we want -- Solr to depend on Lucene but not vice
> versa, right?
>
> Mike
>
> On Tue, Mar 16, 2010 at 5:18 PM, Shai Erera  wrote:
>> I have to agree w/ Jake that putting Lucene under Solr gives the impression
>> as if suddenly Lucene became dependent on it ... and for really no good
>> reasons. Are we making that decision to simplify the build of Solr? What are
>> the problems Solr faces today w.r.t. its build and using a Lucene release or
>> trunk revision?
>>
>> I didn't follow the Lucene/Solr merge on general@, because I didn't even
>> know such a beast exists. So I guess I'm missing something ...
>>
>> Shai
>>
>> On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix  wrote:
>>>
>>> On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley  wrote:

 > Chiming in just a bit here - isn't there any concern that independent
 > of
 > whether or not people "can"
 > build lucene without checking out solr, the mere fact that Lucene will
 > be
 > effectively a "subdirectory"
 > of solr...  is there no concern that there will then be a perception
 > that Lucene is a subproject of
 > Solr, instead of vice-versa?

 Who would have this perception?
 Casual users will be using downloads.
>>>
>>> Developers and dev managers at companies doing build vs. buy decisions
>>> regarding
>>> whether they will do one of the following:
>>> 1) pay big bucks to get FAST or whatever
>>> 2) use Solr (free/cheap!)
>>> 3) pay [variable] bucks to build their own with Lucene
>>> 4) pay [variable but high] to build their own from scratch
>>> I'm not concerned with casual downloaders.  I'm talking about the
>>> companies and people who
>>> may or may not be interested in making multi-million dollar decisions
>>> regarding using or
>>> not using Lucene or Solr.
>>>   -jake
>>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals

2010-03-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2326:
--

Attachment: LUCENE-2326.patch

Here the patch, before applying do the following (in main checkout folder):

{noformat}
ant clean-backwards
svn mkdir ./backwards
svn cp 
https://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0_back_compat_tests/src
 .
svn propset svn:externals "-r500 svn://svn.tartarus.org/snowball/trunk/data 
data" contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
svn propdel svn:ignore 
contrib/analyzers/common/src/test/org/apache/lucene/analysis/snowball
{noformat}

Then apply patch and run svn up.

> Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards 
> branch and linking snowball tests by svn:externals
> ---
>
> Key: LUCENE-2326
> URL: https://issues.apache.org/jira/browse/LUCENE-2326
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: Flex Branch, 3.1
>
> Attachments: LUCENE-2326.patch
>
>
> As we often need to update backwards tests together with trunk and always 
> have to update the branch first, record rev no, and update build xml, I would 
> simply like to do a svn copy/move of the backwards branch.
> After a release, this is simply also done:
> {code}
> svn rm backwards
> svn cp releasebranch backwards
> {code}
> By this we can simply commit in one pass, create patches in one pass.
> The snowball tests are currently downloaded by svn.exe, too. These need a 
> fixed version for checkout. I would like to change this to use svn:externals. 
> Will provide patch, soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
But it's actually the reverse?  Solr depends on Lucene but not vice/versa.

(If instead I proposed making Solr a subdir of Lucene then I'd agree)

So... if you checkout only lucene, you can cd there and do all you do
today with Lucene ("ant test", "ant dist", "svn diff", etc.).

If you checkout solr, you can cd there and "ant test" will run all of
Lucene's and all of Solr's tests.  "svn diff" will include any changes
to lucene and to solr.

Ie this achieves want we want -- Solr to depend on Lucene but not vice
versa, right?

Mike

On Tue, Mar 16, 2010 at 5:18 PM, Shai Erera  wrote:
> I have to agree w/ Jake that putting Lucene under Solr gives the impression
> as if suddenly Lucene became dependent on it ... and for really no good
> reasons. Are we making that decision to simplify the build of Solr? What are
> the problems Solr faces today w.r.t. its build and using a Lucene release or
> trunk revision?
>
> I didn't follow the Lucene/Solr merge on general@, because I didn't even
> know such a beast exists. So I guess I'm missing something ...
>
> Shai
>
> On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix  wrote:
>>
>> On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley  wrote:
>>>
>>> > Chiming in just a bit here - isn't there any concern that independent
>>> > of
>>> > whether or not people "can"
>>> > build lucene without checking out solr, the mere fact that Lucene will
>>> > be
>>> > effectively a "subdirectory"
>>> > of solr...  is there no concern that there will then be a perception
>>> > that Lucene is a subproject of
>>> > Solr, instead of vice-versa?
>>>
>>> Who would have this perception?
>>> Casual users will be using downloads.
>>
>> Developers and dev managers at companies doing build vs. buy decisions
>> regarding
>> whether they will do one of the following:
>> 1) pay big bucks to get FAST or whatever
>> 2) use Solr (free/cheap!)
>> 3) pay [variable] bucks to build their own with Lucene
>> 4) pay [variable but high] to build their own from scratch
>> I'm not concerned with casual downloaders.  I'm talking about the
>> companies and people who
>> may or may not be interested in making multi-million dollar decisions
>> regarding using or
>> not using Lucene or Solr.
>>   -jake
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
Dev is now merged with Solr and Lucene -- that has already passed.  If
that will scare customers away, that's a risk we take -- the benefits
of merged dev outweigh that, in my opinion.

The incremental risk that the details of our svn URLs will scare
people away seems negligible.

And we can always change this up, later, if we decide to.

I think what's important now is a we pick something to un-block trunk
dev.  Sure people can keep working on the branch but I think it'd be
better if we get this simple "svn move" done so that we can get normal
dev going on a shared trunk again.

Mike

On Tue, Mar 16, 2010 at 5:28 PM, Jake Mannix  wrote:
>
> On Tue, Mar 16, 2010 at 3:10 PM, Yonik Seeley  wrote:
>>
>> On Tue, Mar 16, 2010 at 6:01 PM, Jake Mannix 
>> wrote:
>> > I'm not concerned with casual downloaders.  I'm talking
>>
>> > about the companies and people who may or may not be
>>
>> > interested in making multi-million dollar decisions regarding
>>
>> > using or not using Lucene or Solr.
>>
>> Heh - multi-million dollar decisions after a quick glance at an SVN url?
>
> Clearly not.  But just as I think that making the development of
> both solr and lucene easier is a noble goal, I think that giving
> people the impression that by choosing to "go with Lucene"
> *means* they "go with Solr" as their end solution is not what
> we want to do.  There are some places where Solr is just not
> appropriate but Lucene may be.
> Will this impression be "caused" by a SVN directory url
> alone? Of course not.  Merging committer lists, locked
> releases, *and* a SVN url which shows this?  Yes, I
> think the kinds of VPs and CTO's I've talked to and
> tried to help decide whether to go with an open-source
> search solution could indeed start to get the feeling that
> there's really just one apache solution, the
> "Solr/Lucene solution".  And if they look into Solr and
> decide that this particular application is not for them,
> they may then not look deep enough to see whether
> doing a custom Lucene application *would be*.
>   -jake

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Jake Mannix
On Tue, Mar 16, 2010 at 3:10 PM, Yonik Seeley  wrote:

> On Tue, Mar 16, 2010 at 6:01 PM, Jake Mannix 
> wrote:
> > I'm not concerned with casual downloaders.  I'm talking
>
> about the companies and people who may or may not be
>
> interested in making multi-million dollar decisions regarding
>
> using or not using Lucene or Solr.
>
> Heh - multi-million dollar decisions after a quick glance at an SVN url?
>

Clearly not.  But just as I think that making the development of
both solr and lucene easier is a noble goal, I think that giving
people the impression that by choosing to "go with Lucene"
*means* they "go with Solr" as their end solution is not what
we want to do.  There are some places where Solr is just not
appropriate but Lucene may be.

Will this impression be "caused" by a SVN directory url
alone? Of course not.  Merging committer lists, locked
releases, *and* a SVN url which shows this?  Yes, I
think the kinds of VPs and CTO's I've talked to and
tried to help decide whether to go with an open-source
search solution could indeed start to get the feeling that
there's really just one apache solution, the
"Solr/Lucene solution".  And if they look into Solr and
decide that this particular application is not for them,
they may then not look deep enough to see whether
doing a custom Lucene application *would be*.

  -jake


Re: lucene and solr trunk

2010-03-16 Thread Shai Erera
I have to agree w/ Jake that putting Lucene under Solr gives the impression
as if suddenly Lucene became dependent on it ... and for really no good
reasons. Are we making that decision to simplify the build of Solr? What are
the problems Solr faces today w.r.t. its build and using a Lucene release or
trunk revision?

I didn't follow the Lucene/Solr merge on general@, because I didn't even
know such a beast exists. So I guess I'm missing something ...

Shai

On Wed, Mar 17, 2010 at 12:01 AM, Jake Mannix  wrote:

> On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley  wrote:
>>
>>  > Chiming in just a bit here - isn't there any concern that independent
>> of
>> > whether or not people "can"
>> > build lucene without checking out solr, the mere fact that Lucene will
>> be
>> > effectively a "subdirectory"
>> > of solr...  is there no concern that there will then be a perception
>> that Lucene is a subproject of
>> > Solr, instead of vice-versa?
>>
>> Who would have this perception?
>> Casual users will be using downloads.
>>
>
> Developers and dev managers at companies doing build vs. buy decisions
> regarding
> whether they will do one of the following:
>
> 1) pay big bucks to get FAST or whatever
> 2) use Solr (free/cheap!)
> 3) pay [variable] bucks to build their own with Lucene
> 4) pay [variable but high] to build their own from scratch
>
> I'm not concerned with casual downloaders.  I'm talking about the companies
> and people who
> may or may not be interested in making multi-million dollar decisions
> regarding using or
> not using Lucene or Solr.
>
>   -jake
>


Re: lucene and solr trunk

2010-03-16 Thread Yonik Seeley
On Tue, Mar 16, 2010 at 6:01 PM, Jake Mannix  wrote:
> I'm not concerned with casual downloaders.  I'm talking about the companies
> and people who
> may or may not be interested in making multi-million dollar decisions
> regarding using or
> not using Lucene or Solr.

Heh - multi-million dollar decisions after a quick glance at an SVN url?

-Yonik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Jake Mannix
On Tue, Mar 16, 2010 at 2:53 PM, Yonik Seeley  wrote:
>
>  > Chiming in just a bit here - isn't there any concern that independent of
> > whether or not people "can"
> > build lucene without checking out solr, the mere fact that Lucene will be
> > effectively a "subdirectory"
> > of solr...  is there no concern that there will then be a perception that
> Lucene is a subproject of
> > Solr, instead of vice-versa?
>
> Who would have this perception?
> Casual users will be using downloads.
>

Developers and dev managers at companies doing build vs. buy decisions
regarding
whether they will do one of the following:

1) pay big bucks to get FAST or whatever
2) use Solr (free/cheap!)
3) pay [variable] bucks to build their own with Lucene
4) pay [variable but high] to build their own from scratch

I'm not concerned with casual downloaders.  I'm talking about the companies
and people who
may or may not be interested in making multi-million dollar decisions
regarding using or
not using Lucene or Solr.

  -jake


Re: lucene and solr trunk

2010-03-16 Thread Shai Erera
Where would the modules live?

I'm not sure if I sent it on this thread or somewhere else, but what about
my proposal to have all three sitting under their own directories, w/ their
own trunk/branch/tags, and if it's easier for dev then put all three under
one root (for permission management maybe)?

Shai

On Tue, Mar 16, 2010 at 11:53 PM, Yonik Seeley  wrote:

> On Tue, Mar 16, 2010 at 5:42 PM, Jake Mannix 
> wrote:
> > On Tue, Mar 16, 2010 at 2:31 PM, Michael McCandless
> >  wrote:
> >>
> >> If we move lucene under Solr's existing svn path, ie:
> >>
> >>  /solr/trunk/lucene
> >
> > Chiming in just a bit here - isn't there any concern that independent of
> > whether or not people "can"
> > build lucene without checking out solr, the mere fact that Lucene will be
> > effectively a "subdirectory"
> > of solr...  is there no concern that there will then be a perception that
> Lucene is a subproject of
> > Solr, instead of vice-versa?
>
> Who would have this perception?
> Casual users will be using downloads.
>
> Likewise, should solr be concerned that it's currently under a lucene
> URL?  How many casual users actually understand the difference between
> the lucene TLP and the lucene java subproject?
>
> This is really about what makes most sense for development.
>
> -Yonik
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>


Re: lucene and solr trunk

2010-03-16 Thread Yonik Seeley
On Tue, Mar 16, 2010 at 5:42 PM, Jake Mannix  wrote:
> On Tue, Mar 16, 2010 at 2:31 PM, Michael McCandless
>  wrote:
>>
>> If we move lucene under Solr's existing svn path, ie:
>>
>>  /solr/trunk/lucene
>
> Chiming in just a bit here - isn't there any concern that independent of
> whether or not people "can"
> build lucene without checking out solr, the mere fact that Lucene will be
> effectively a "subdirectory"
> of solr...  is there no concern that there will then be a perception that 
> Lucene is a subproject of
> Solr, instead of vice-versa?

Who would have this perception?
Casual users will be using downloads.

Likewise, should solr be concerned that it's currently under a lucene
URL?  How many casual users actually understand the difference between
the lucene TLP and the lucene java subproject?

This is really about what makes most sense for development.

-Yonik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Jake Mannix
On Tue, Mar 16, 2010 at 2:31 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> If we move lucene under Solr's existing svn path, ie:
>
>  /solr/trunk/lucene


Chiming in just a bit here - isn't there any concern that independent of
whether or not people "can"
build lucene without checking out solr, the mere fact that Lucene will be
effectively a "subdirectory"
of solr... is there no concern that there will then be a perception that
Lucene is a subproject of
Solr, instead of vice-versa?

The way mavenified projects work is that there would instead be a top level
in which both solr
and lucene would be submodules (and thus also subdirectories in svn), with a
dependency
from solr to lucene (in the pom.xml for maven, but easy enough to do with
the build.xml with
ant).  Checking out solr without lucene should be doable (using snapshot
jars from lucene
trunk nightly, maybe?), and the reverse should be easy, as could be checking
out the
top-level and getting everything (including a top-level build.xml which
's or antcall's
into the subdirectory build.xmls).

It seems really weird to have Lucene appear as a subdirectory of Solr,
especially for people
out there who aren't using Solr.

  -jake


Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
The primary concern seems to be ensuring that, once we
merge svn, one can still checkout & build & run tests/etc for
Lucene alone.

If we move lucene under Solr's existing svn path, ie:

  /solr/trunk/lucene

and then fixup solr's build files to go and compile sources from the
lucene dir, run tests there, etc., then, one can still checkout & run
lucene fully independently -- this addresses that concern?

So how about we start with this approach?  Progress not
perfection...  If somehow this layout is a problem then we can just
move things around, again.

Alot of great progress has already been made on the temporary branch
-- Solr runs fine on Lucene trunk!  And, also on flex.  We need to
settle an initial svn structure so the changes on the branch can
be fully reviewed & then committed to trunk and normal dev can
proceed...

We don't need to solve how modules/contribs, etc., are going to be
fixed, now -- that all can come later.  IRC issues, using GIT instead,
etc. should also be discussed separately.  Let's just pick a place in
svn and free up ongoing dev...

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-16 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846128#action_12846128
 ] 

Michael Busch commented on LUCENE-2324:
---

I think we all agree that we want to have a single writer thread, multi reader 
thread model.  Only then the thread-safety problems in LUCENE-2312 can be 
reduced to visibility (no write-locking).  So I think making this change first 
makes most sense.  It involves a bit boring refactoring work unfortunately. 

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Yonik Seeley
IRC has been discussed to death at Apache:
http://markmail.org/search/?q=IRC+list%3Aorg.apache.incubator.general

Look for the spikes... like this:

http://markmail.org/search/?q=IRC+list%3Aorg.apache.incubator.general#query:IRC%20list%3Aorg.apache.incubator.general%20date%3A200608%20+page:1+state:facets

-Yonik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846112#action_12846112
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

Actually TermsHashField doesn't need to be concurrent, it's only being written 
to and the terms concurrent skiplist (was a btree) holds the reference to the 
posting list.  So I think we're good there because terms enum never accesses 
the terms hash.  Nice!



> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846110#action_12846110
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

NormsWriterPerField has a growing norm byte array, we'd need a way to 
read/write lock it... 

I think we have concurrency issues in the TermsHash table?  Maybe it'd need to 
be rewritten to use ConcurrentHashMap?

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals

2010-03-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846108#action_12846108
 ] 

Michael McCandless commented on LUCENE-2326:


+1

This sounds sooo much better than what we do now.

> Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards 
> branch and linking snowball tests by svn:externals
> ---
>
> Key: LUCENE-2326
> URL: https://issues.apache.org/jira/browse/LUCENE-2326
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: Flex Branch, 3.1
>
>
> As we often need to update backwards tests together with trunk and always 
> have to update the branch first, record rev no, and update build xml, I would 
> simply like to do a svn copy/move of the backwards branch.
> After a release, this is simply also done:
> {code}
> svn rm backwards
> svn cp releasebranch backwards
> {code}
> By this we can simply commit in one pass, create patches in one pass.
> The snowball tests are currently downloaded by svn.exe, too. These need a 
> fixed version for checkout. I would like to change this to use svn:externals. 
> Will provide patch, soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Michael McCandless
On Tue, Mar 16, 2010 at 2:17 PM, Michael Busch  wrote:

> But at the same time can we make sure that the decisions that are made on
> IRC are still being described in a jira issue?

+1

Any time something is discussed on IRC, it must be summarized on the
lists or in an issue, with the details based on what was discussed, or
else it didn't happen

IRC is a great way to hash out ideas, brainstorm, shoot the breeze,
vent, etc.  Much of what's discussed doesn't pan out... but when stuff
does we always bring to the lists...

Those of us spending some time on IRC have been trying to do exactly
that.  Maybe we've been falling short sometimes, not providing enough
detail, so we should fix that with time.  We're all still learning as
we go...

Also: if an issue is opened and it's missing details, regardless of
whether it was born in IRC or some other place, people should simply
ask questions, punch holes, etc.  When another set of eyes, or the
same set of eyes some time later, look at the issue, very different
and healthy iterations happen.  Most certainly if something seems like
a good idea during IRC discussions that doesn't not mean the debate is
done -- rather the issue is opened and lots of other people chime in.

Nothing is "decided" on IRC... only ideas are born... that's all.

Stepping back, Lucene/Solr are clearly at a fast pace of innovation
right now, and this is really very healthy.  It'd already been fast a
few months ago, but it seems to be accelerating... I think that's
because suddenly we have quite a few strong [near-] full-time devs
here, and, because IRC allows for real-time conversations for
brainstorming.

This is net/net good for both Lucene and Solr and I think we should
try to find a way to make IRC work well so devs that do happen to
have the time (and, the list will change with time -- bright stars never
shine for long) can brainstorm and bring new ideas to the community...

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846102#action_12846102
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

Michael,

For LUCENE-2312, I think the searching isn't going to be an
issue, I've got basic per thread doc writers working (though not
thoroughly tested). I didn't see a great need to rework all the
classes, which even if we did, I'm not sure helps with the byte
array read write issues? I'd prefer to get a proof of concept
more or less working, then refine it from there. I think there's
two main design/implementation issues before we can roll
something out:

1) A new skip list implementation that at specific intervals
writes a new skip (ie, single level). Right now in trunk we have
a multilevel skiplist that requires ahead of time the number of
docs.

2) Figure out the low -> high levels of byte/char/int array
visibility to reader threads. The main challenge here is the
fact that the DW related code that utilizes this is really hard
for me to understand enough to know what can be changed, without
the side effect being bunches of other broken stuff. If there
was a Directory like class abstraction we could simply override
and reimplement, we could do that, and maybe there is one, I'm
not sure yet. 

However if reworking the PerThread classes somehow makes the tie
into the IO (eg, the byte array pooling) system abstracted and
easier, then I'm all for it.

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Chris Hostetter

: with, "if id didn't happen on the lists, it didn't happen". Its the same as

+1

But as the IRC channel gets used more and more, it would *also* be nice if 
there was an archive of the IRC channel so that there is a place to go 
look to understand the back story behind an idea once it's synthesized and 
posted to the lists/jira.

That's the huge advantage IRC has over informal conversations at 
hackathons, apachecon, and meetups -- there can in fact be easily 
archivable/parsable/searchable records of the communication.



-Hoss


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Grant Ingersoll

On Mar 16, 2010, at 3:24 PM, Mark Miller wrote:

> On 03/16/2010 02:57 PM, Grant Ingersoll wrote:
>> On Mar 16, 2010, at 2:47 PM, Steven A Rowe wrote:
>> 
>>   
>>> On 03/16/2010 at 6:06 AM, Michael McCandless wrote:
>>> 
 Does anyone know how other projects fold in IRC...?
   
>>> I gather from the deafening silence that we'll have to figure it out as we 
>>> go...
>>> 
>>> I think some (not all) of the discomfort associated with IRC could be 
>>> addressed with a permanent, searchable, linkable archive of #lucene.
>>> 
>>> I went looking for IRC loggers and found http://colabti.org/.  One of the 
>>> things hosted there is a searchable, linkable permanent archive of several 
>>> freenode channels.  I posted on #irclogger asking about hosting #lucene 
>>> archive, and apparently all we have to do is ask, after first determining 
>>> that nobody objects.  Here's a link (not incidentally, this is exactly what 
>>> we will have for #lucene once the service is switched on):
>>> 
>>> http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2
>>> 
>>> So, would anybody participating on #lucene object to a permanent archive?
>>> 
>>> (I'm also going to provide a link to this thread on #lucene to make sure 
>>> everybody there knows about the issue.)
>>> 
>> There's also a lot of chatter that happens on IRC, so logging is going to 
>> have a lot of noise.  I'm still on the fence on what to do.  I don't want to 
>> get in people's way, but we also need to have traceability about decisions, 
>> and we certainly can't have answers like "We discussed this on IRC and you 
>> missed it, too bad" happening (not saying that has happening, just saying I 
>> don't want to see it).
>> 
>> -Grant
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> 
>>   
> 
> Even with logging, I'm against using IRC for making decisions, or as 
> something people can point to. Even with searchable logging, I think we 
> should stick with, "if id didn't happen on the lists, it didn't happen". Its 
> the same as when some of us get together and talk about Lucene and Solr - 
> thats great stuff - you can get a lot done that is a lot harder on the lists 
> - you can hash a lot out. But I think people should always have the right to 
> act like it didn't happen - the same as if we are at ApacheCon or something - 
> we don't come back and say, sorry, you missed all the discussion, but we had 
> one and this what we are going to do. We summarize the discussion on the list 
> (like Mike likes to do with IRC), and answer questions as people have them. I 
> personally think its great to come to mini agreements with real-time talk - 
> then it just has to make its way through the list.
> 
> This isn't a counter point to anything you said Grant, just a nice place for 
> me to drop this.
> 


+1.  The ApacheCon talks are a great example of bringing back off list stuff to 
the list.
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-16 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846084#action_12846084
 ] 

Michael Busch commented on LUCENE-2324:
---

Shall we not first try to remove the downstream *PerThread classes and make the 
DocumentsWriter single-threaded without locking.  Then we add a 
PerThreadDocumentsWriter and DocumentsWriterThreadBinder, which talks to the 
PerThreadDWs and IW talks to DWTB.  We can pick other names :)

When that's done we can think about what kind of 
locking/synchronization/volatile stuff we need for LUCENE-2312.

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Mark Miller

On 03/16/2010 02:57 PM, Grant Ingersoll wrote:

On Mar 16, 2010, at 2:47 PM, Steven A Rowe wrote:

   

On 03/16/2010 at 6:06 AM, Michael McCandless wrote:
 

Does anyone know how other projects fold in IRC...?
   

I gather from the deafening silence that we'll have to figure it out as we go...

I think some (not all) of the discomfort associated with IRC could be addressed 
with a permanent, searchable, linkable archive of #lucene.

I went looking for IRC loggers and found http://colabti.org/.  One of the 
things hosted there is a searchable, linkable permanent archive of several 
freenode channels.  I posted on #irclogger asking about hosting #lucene 
archive, and apparently all we have to do is ask, after first determining that 
nobody objects.  Here's a link (not incidentally, this is exactly what we will 
have for #lucene once the service is switched on):

http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2

So, would anybody participating on #lucene object to a permanent archive?

(I'm also going to provide a link to this thread on #lucene to make sure 
everybody there knows about the issue.)
 

There's also a lot of chatter that happens on IRC, so logging is going to have a lot of 
noise.  I'm still on the fence on what to do.  I don't want to get in people's way, but 
we also need to have traceability about decisions, and we certainly can't have answers 
like "We discussed this on IRC and you missed it, too bad" happening (not 
saying that has happening, just saying I don't want to see it).

-Grant
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

   


Even with logging, I'm against using IRC for making decisions, or as 
something people can point to. Even with searchable logging, I think we 
should stick with, "if id didn't happen on the lists, it didn't happen". 
Its the same as when some of us get together and talk about Lucene and 
Solr - thats great stuff - you can get a lot done that is a lot harder 
on the lists - you can hash a lot out. But I think people should always 
have the right to act like it didn't happen - the same as if we are at 
ApacheCon or something - we don't come back and say, sorry, you missed 
all the discussion, but we had one and this what we are going to do. We 
summarize the discussion on the list (like Mike likes to do with IRC), 
and answer questions as people have them. I personally think its great 
to come to mini agreements with real-time talk - then it just has to 
make its way through the list.


This isn't a counter point to anything you said Grant, just a nice place 
for me to drop this.


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Michael Busch

It be very cool to have a searchable archive for the IRC discussions, so +1.

But at the same time can we make sure that the decisions that are made 
on IRC are still being described in a jira issue?  I don't mean that 
people should repeat brainstorming, but if a discussion leads to opening 
a Jira issue it'd be good to understand the reasons and details without 
having to search the IRC log.  Only if someone wants to know more, e.g. 
what lead to the discussion, what other ideas were discarded, etc. 
should have to go to the IRC log.


 Michael

On 3/16/10 11:58 AM, Michael McCandless wrote:

+1, this looks great!

Mike

On Tue, Mar 16, 2010 at 1:52 PM, Andi Vajda  wrote:
   

On Mar 16, 2010, at 11:47, Steven A Rowe  wrote:

 

On 03/16/2010 at 6:06 AM, Michael McCandless wrote:
   

Does anyone know how other projects fold in IRC...?
 

I gather from the deafening silence that we'll have to figure it out as we
go...

I think some (not all) of the discomfort associated with IRC could be
addressed with a permanent, searchable, linkable archive of #lucene.

I went looking for IRC loggers and found http://colabti.org/.  One of the
things hosted there is a searchable, linkable permanent archive of several
freenode channels.  I posted on #irclogger asking about hosting #lucene
archive, and apparently all we have to do is ask, after first determining
that nobody objects.  Here's a link (not incidentally, this is exactly what
we will have for #lucene once the service is switched on):

http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2

So, would anybody participating on #lucene object to a permanent archive?
   

No objections on my part. I think this is essential.

Andi..

 

(I'm also going to provide a link to this thread on #lucene to make sure
everybody there knows about the issue.)

Steve


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

   

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


 

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


   



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Michael McCandless
+1, this looks great!

Mike

On Tue, Mar 16, 2010 at 1:52 PM, Andi Vajda  wrote:
>
> On Mar 16, 2010, at 11:47, Steven A Rowe  wrote:
>
>> On 03/16/2010 at 6:06 AM, Michael McCandless wrote:
>>>
>>> Does anyone know how other projects fold in IRC...?
>>
>> I gather from the deafening silence that we'll have to figure it out as we
>> go...
>>
>> I think some (not all) of the discomfort associated with IRC could be
>> addressed with a permanent, searchable, linkable archive of #lucene.
>>
>> I went looking for IRC loggers and found http://colabti.org/.  One of the
>> things hosted there is a searchable, linkable permanent archive of several
>> freenode channels.  I posted on #irclogger asking about hosting #lucene
>> archive, and apparently all we have to do is ask, after first determining
>> that nobody objects.  Here's a link (not incidentally, this is exactly what
>> we will have for #lucene once the service is switched on):
>>
>> http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2
>>
>> So, would anybody participating on #lucene object to a permanent archive?
>
> No objections on my part. I think this is essential.
>
> Andi..
>
>>
>> (I'm also going to provide a link to this thread on #lucene to make sure
>> everybody there knows about the issue.)
>>
>> Steve
>>
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Grant Ingersoll

On Mar 16, 2010, at 2:47 PM, Steven A Rowe wrote:

> On 03/16/2010 at 6:06 AM, Michael McCandless wrote:
>> Does anyone know how other projects fold in IRC...?
> 
> I gather from the deafening silence that we'll have to figure it out as we 
> go...
> 
> I think some (not all) of the discomfort associated with IRC could be 
> addressed with a permanent, searchable, linkable archive of #lucene.
> 
> I went looking for IRC loggers and found http://colabti.org/.  One of the 
> things hosted there is a searchable, linkable permanent archive of several 
> freenode channels.  I posted on #irclogger asking about hosting #lucene 
> archive, and apparently all we have to do is ask, after first determining 
> that nobody objects.  Here's a link (not incidentally, this is exactly what 
> we will have for #lucene once the service is switched on):
> 
> http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2
> 
> So, would anybody participating on #lucene object to a permanent archive?
> 
> (I'm also going to provide a link to this thread on #lucene to make sure 
> everybody there knows about the issue.)

There's also a lot of chatter that happens on IRC, so logging is going to have 
a lot of noise.  I'm still on the fence on what to do.  I don't want to get in 
people's way, but we also need to have traceability about decisions, and we 
certainly can't have answers like "We discussed this on IRC and you missed it, 
too bad" happening (not saying that has happening, just saying I don't want to 
see it).

-Grant
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Andi Vajda


On Mar 16, 2010, at 11:47, Steven A Rowe  wrote:


On 03/16/2010 at 6:06 AM, Michael McCandless wrote:

Does anyone know how other projects fold in IRC...?


I gather from the deafening silence that we'll have to figure it out  
as we go...


I think some (not all) of the discomfort associated with IRC could  
be addressed with a permanent, searchable, linkable archive of  
#lucene.


I went looking for IRC loggers and found http://colabti.org/.  One  
of the things hosted there is a searchable, linkable permanent  
archive of several freenode channels.  I posted on #irclogger asking  
about hosting #lucene archive, and apparently all we have to do is  
ask, after first determining that nobody objects.  Here's a link  
(not incidentally, this is exactly what we will have for #lucene  
once the service is switched on):


http://colabti.org/irclogger/irclogger_log/irclogger? 
date=2010-03-16#l2


So, would anybody participating on #lucene object to a permanent  
archive?


No objections on my part. I think this is essential.

Andi..



(I'm also going to provide a link to this thread on #lucene to make  
sure everybody there knows about the issue.)


Steve


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



#lucene IRC log [was: RE: lucene and solr trunk]

2010-03-16 Thread Steven A Rowe
On 03/16/2010 at 6:06 AM, Michael McCandless wrote:
> Does anyone know how other projects fold in IRC...?

I gather from the deafening silence that we'll have to figure it out as we go...

I think some (not all) of the discomfort associated with IRC could be addressed 
with a permanent, searchable, linkable archive of #lucene.

I went looking for IRC loggers and found http://colabti.org/.  One of the 
things hosted there is a searchable, linkable permanent archive of several 
freenode channels.  I posted on #irclogger asking about hosting #lucene 
archive, and apparently all we have to do is ask, after first determining that 
nobody objects.  Here's a link (not incidentally, this is exactly what we will 
have for #lucene once the service is switched on):

http://colabti.org/irclogger/irclogger_log/irclogger?date=2010-03-16#l2

So, would anybody participating on #lucene object to a permanent archive?

(I'm also going to provide a link to this thread on #lucene to make sure 
everybody there knows about the issue.)

Steve


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846037#action_12846037
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

Are there going to be issues with the char array buffers as well (ie, will we 
need to also flush them for concurrency?)

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Shai Erera
Hi

My only concern w/ how SVN might end up organized is that I'll still be able
checkout core lucene independently of Solr (and possibly contrib/modules)
and then build and test it. Also a separate project in Eclipse is important
as well.

How about this structure:
/solr/trunk
/lucene/trunk
/modules/trunk

 can be left out if we don't think it's necessary.

This should allow us to:
1) Release each and everyone of them independently
2) Introduce dependencies between modules -> lucene and Solr -> modules +
lucene as, IMO, it should be. Lucene is core, modules extends it and Solr
extends and uses both.
3) Allow one to checkout exactly what it needs to work on.
4) Modules will always depend on a certain lucene version, either a cut
release or trunk. When it's released, its build.xml will be changed as part
of the release process to point to the lucene release (not trunk!) it
supports and depends on.
5) Same for Solr.

When a patch for Solr needs to change code in lucene, it is done it both, by
two different patches. Both are committed within the same issue. Since each
trunk can depends on the other's trunk, this shouldn't be a problem.

Indeed, it will complicate a bit the build.xmls - like it's done today for
core lucene and backwards. But that's ok I think. I don't expect all Solr
issues to require a change in lucene as well as not all modules issues will.
So that change to the build.xml should not be a frequent operation.

Another thing this will change (and I think for the better) is that a Solr
release might require cutting a Lucene and modules ones, and I think we
should be flexible about that. This also is not something I think will be
frequent ... like today, Solr could still be limited to a certain lucene
release or trunk revision.

I still this is still in line w/ one project, one codebase, just different
levels of the really big parts (Solr, lucene and modules). Committers can be
given access to  which will give them access to everything. Others
(modules-committers) can be given access to just that folder (hijacking a
bit from the other thread).

The flexibility of being able to checkout lucene code only is important, at
least to me. I wouldn't want to lose it.

On the IRC stuff - I know that we cannot prevent anyone from discussing on
issues anywhere, and I respect that freedom. It's just that some time ago I
was told that I shouldn't hold 'private' discussions on Lucene, outside the
community. I know that this IRC channel, that's called #lucene, is not
completely outside the community, but here's how it looks to the outsider
(not on IRC):
1) An issue is opened w/ comment "summarizing discussion on IRC ...".
2) Then a couple of hours later (or days), new comment: "more discussion
summary on IRC".
3) Then some comment, some that are not on IRC
4) Then more comment (from an IRC-er): "ok we've discussed this and here's
what we came up with ..."

Feels like we're on a need to know basis here. Remember that when a
discussion is fully open, you might have some comments on what was said in
the process. When you are given the final decision, or a summary, you cannot
comment on what you weren't told. That's a bit frustrating ... though I'm
trying very hard to be involved w/ the mailing list, it feels like I miss
TONS of discussions on IRC ... and what seems worse (as I read somewhere in
the thread) is that you can open an issue w/ an idea (like happened to me),
just to discover the folks on IRC took it all the way to design and impl
proposals, and I was left to read the summarization ...

So by no means am I trying to suggest that IRC discussions should stop. As I
don't, can't and won't ever have control on that. Just like I cannot keep
two people sitting in next rooms to discuss on issues or Lucene outside the
list. But I'd feel better if when a discussion makes it to the list or an
issue, it'd be conducted there from now on, and not as snippets/summaries of
the IRC discussion. Can we keep at least that?

I don't want to get people off their seats w/ that request :). I'm not even
sure I'm in a position to make such requests :). But I'd appreciate if it
can be at least discussed (not on IRC).

Shai

On Tue, Mar 16, 2010 at 5:48 PM, Grant Ingersoll wrote:

>
> On Mar 16, 2010, at 10:18 AM, Mark Miller wrote:
>
> > On 03/16/2010 10:09 AM, Yonik Seeley wrote:
> >> On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch
>  wrote:
> >>
> >>> Also, we're in review-and-commit process, not commit-and-review.
>  Changes have to be
> >>> proposed, discussed and ideally attached to jira as patches first.
> >>>
> >> Correction, just for the sake of avoiding future confusion (i.e. I'm
> >> not making any point about this thread):
> >>
> >> Lucene and Solr have always officially been CTR.
> >> For trunk, we normally use a bit of informal lazy consensus for
> >> anything big, hard, or that might be controvertial... but we are not
> >> officially RTC.
> >>
> >> -Yonik
> >>
> >> -

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846028#action_12846028
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

Carrying over from LUCENE-2312.  I'm proposing we for starters have a byte 
slice writer, lock, move or copy(?) the bytes from the writable byte 
pool/writer to a read only byte block pool, unlock.  This sounds like a fairly 
self-contained thing that can be unit tested at a low level.

Mike, can you add a bit as to how this could work?  Also, what is the 
IntBlockPool used for?  

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals

2010-03-16 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2326:
--

Fix Version/s: 3.1
   Flex Branch

I think the ideal case for this would be that the backwards folder simply 
contains the src-folder of the previous branch (after creation). No extra 
folder like now in between, so it looks like "/backwards/src/...". After a 
release, one would first "svn rm" the old and then "svn copy" the src folder of 
the previously created release branch to trunk. I would add this to the release 
todo.

On this change, all committers must first manually do a operating-system "rm 
-rf" on the backwards folder by calling "ant clean-backwards" before svn up. 
Maybe create a patch before :-)

> Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards 
> branch and linking snowball tests by svn:externals
> ---
>
> Key: LUCENE-2326
> URL: https://issues.apache.org/jira/browse/LUCENE-2326
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: Flex Branch, 3.1
>
>
> As we often need to update backwards tests together with trunk and always 
> have to update the branch first, record rev no, and update build xml, I would 
> simply like to do a svn copy/move of the backwards branch.
> After a release, this is simply also done:
> {code}
> svn rm backwards
> svn cp releasebranch backwards
> {code}
> By this we can simply commit in one pass, create patches in one pass.
> The snowball tests are currently downloaded by svn.exe, too. These need a 
> fixed version for checkout. I would like to change this to use svn:externals. 
> Will provide patch, soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-16 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846002#action_12846002
 ] 

Shai Erera commented on LUCENE-2310:


I agree. Then keeping both deprecated and new API should be supported easily.

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals

2010-03-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846000#action_12846000
 ] 

Robert Muir commented on LUCENE-2326:
-

I agree i think its nice to see a patch to lucene that includes 
any changes to the backwards tests.

Mike did this with LUCENE-2111 and i was shocked, until
I found out he was doing it manually with cat.

> Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards 
> branch and linking snowball tests by svn:externals
> ---
>
> Key: LUCENE-2326
> URL: https://issues.apache.org/jira/browse/LUCENE-2326
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>
> As we often need to update backwards tests together with trunk and always 
> have to update the branch first, record rev no, and update build xml, I would 
> simply like to do a svn copy/move of the backwards branch.
> After a release, this is simply also done:
> {code}
> svn rm backwards
> svn cp releasebranch backwards
> {code}
> By this we can simply commit in one pass, create patches in one pass.
> The snowball tests are currently downloaded by svn.exe, too. These need a 
> fixed version for checkout. I would like to change this to use svn:externals. 
> Will provide patch, soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2326) Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards branch and linking snowball tests by svn:externals

2010-03-16 Thread Uwe Schindler (JIRA)
Remove SVN.exe and revision numbers from build.xml by svn-copy the backwards 
branch and linking snowball tests by svn:externals
---

 Key: LUCENE-2326
 URL: https://issues.apache.org/jira/browse/LUCENE-2326
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler


As we often need to update backwards tests together with trunk and always have 
to update the branch first, record rev no, and update build xml, I would simply 
like to do a svn copy/move of the backwards branch.

After a release, this is simply also done:
{code}
svn rm backwards
svn cp releasebranch backwards
{code}

By this we can simply commit in one pass, create patches in one pass.

The snowball tests are currently downloaded by svn.exe, too. These need a fixed 
version for checkout. I would like to change this to use svn:externals. Will 
provide patch, soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-03-16 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845978#action_12845978
 ] 

Michael Busch commented on LUCENE-2312:
---

{quote}
think we simply need a way to publish byte arrays to all
threads? Michael B. can you post something of what you have so
we can get an idea of how your system will work (ie, mainly what
the assumptions are)?
{quote}

It's kinda complicated to explain and currently differs from Lucene's TermHash 
classes a lot.  I'd prefer to wait a little bit until I have verified that my 
solution works.

I think here we should really tackle LUCENE-2324 first - it's a prereq.  Wanna 
help with that, Jason?

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845971#action_12845971
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

To clarify the above comment, DW's update doc method would acquire a mutex.  
The flush bytes method would also acquire that mutex when it copies existing 
writeable bytes over to the readable bytes thing (pool?).

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-16 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845972#action_12845972
 ] 

Chris Male commented on LUCENE-2310:


I recommend we keep it as a List since that facilitates having different 
iterators by FieldType criteria more.  A Map would support get and remove 
better, but I think we want to move people to using Iterators and the remove 
method is there for a case we don't know of yet.

I'll create a patch with these ideas shortly.

Cheers!

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-16 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845968#action_12845968
 ] 

Shai Erera commented on LUCENE-2310:


That was usually the approach. You provide new methods, deprecate old ones, 
however both work and not in a XOR mode. Both should work and we need to ensure 
that if people call both they still function properly. Unless this has changed, 
in which case it should be clearly documented.

But I don't think it is a big problem to support both? If Document still keeps 
its fields in a List then all should remain the same. We could have a 4.0 note 
to switch to a Map based DS to better support remove, but that's questionable 
because we'll need to maintain ordering on the fields (the order in which they 
inserted) though personally I don't think it should matter much to the user, 
however that's the current implementation. 

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-03-16 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845969#action_12845969
 ] 

Michael Busch commented on LUCENE-2312:
---

{quote}
I thought we're moving away from byte block pooling and we're
going to try relying on garbage collection? Does a volatile
object[] publish changes to all threads? Probably not, again
it'd just be the pointer.
{quote}

We were so far only considering moving away from pooling of (Raw)PostingList 
objects.  Pooling byte blocks might have more performance impact - they're more 
heavy-weight.

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1488) multilingual analyzer based on icu

2010-03-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845951#action_12845951
 ] 

Robert Muir commented on LUCENE-1488:
-

Thanks for the review Uwe! moving forwards...

> multilingual analyzer based on icu
> --
>
> Key: LUCENE-1488
> URL: https://issues.apache.org/jira/browse/LUCENE-1488
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: ICUAnalyzer.patch, LUCENE-1488.patch, LUCENE-1488.patch, 
> LUCENE-1488.patch, LUCENE-1488.patch, LUCENE-1488.txt, LUCENE-1488.txt
>
>
> The standard analyzer in lucene is not exactly unicode-friendly with regards 
> to breaking text into words, especially with respect to non-alphabetic 
> scripts.  This is because it is unaware of unicode bounds properties.
> I actually couldn't figure out how the Thai analyzer could possibly be 
> working until i looked at the jflex rules and saw that codepoint range for 
> most of the Thai block was added to the alphanum specification. defining the 
> exact codepoint ranges like this for every language could help with the 
> problem but you'd basically be reimplementing the bounds properties already 
> stated in the unicode standard. 
> in general it looks like this kind of behavior is bad in lucene for even 
> latin, for instance, the analyzer will break words around accent marks in 
> decomposed form. While most latin letter + accent combinations have composed 
> forms in unicode, some do not. (this is also an issue for asciifoldingfilter 
> i suppose). 
> I've got a partially tested standardanalyzer that uses icu Rule-based 
> BreakIterator instead of jflex. Using this method you can define word 
> boundaries according to the unicode bounds properties. After getting it into 
> some good shape i'd be happy to contribute it for contrib but I wonder if 
> theres a better solution so that out of box lucene will be more friendly to 
> non-ASCII text. Unfortunately it seems jflex does not support use of these 
> properties such as [\p{Word_Break = Extend}] so this is probably the major 
> barrier.
> Thanks,
> Robert

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845950#action_12845950
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

{quote}The tricky part is to make sure that a reader always sees
a consistent snapshot of the index. At the same time a reader
must not follow pointers to non-published locations (e.g. array
blocks). {quote}

Right. In what case in the term enum, term docs chain of doc
scoring would a reader potentially try to follow a pointer to a
byte array that doesn't exist? I think we're strictly preventing
it via last doc ids? Also, when we flush, I think we need to
block further doc writing (via an RW lock?) and wait for any
currently writing docs to complete, then forcibly publish the
byte arrays, then release the write lock? This way we always
have published data that's consistent for readers (eg, the
inverted index can be read completely, and there won't be any
wild writes still occurring to a byte array that's been
published).

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Grant Ingersoll

On Mar 16, 2010, at 10:18 AM, Mark Miller wrote:

> On 03/16/2010 10:09 AM, Yonik Seeley wrote:
>> On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch  wrote:
>> 
>>> Also, we're in review-and-commit process, not commit-and-review.  Changes 
>>> have to be
>>> proposed, discussed and ideally attached to jira as patches first.
>>> 
>> Correction, just for the sake of avoiding future confusion (i.e. I'm
>> not making any point about this thread):
>> 
>> Lucene and Solr have always officially been CTR.
>> For trunk, we normally use a bit of informal lazy consensus for
>> anything big, hard, or that might be controvertial... but we are not
>> officially RTC.
>> 
>> -Yonik
>> 
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> 
>> 
> 
> In any case, this is a branch. People really want to enforce RTC on a 
> branch??? Even if that was our official process on trunk (which I agree it 
> has not been) that's not how the flex branch worked. That's not how the 
> solr_cloud branch worked. That's not how other previous branches have worked.
> 
> IMO - anyone should be able to create a branch for anything - to play around 
> with whatever they want. We should encourage this. Branches are good. And 
> they take up little space.
> 

+1.  Furthermore, it is incumbent on the people working on the branch to then 
present and discuss when/how to merge to trunk, just like any big patch.
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-03-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845943#action_12845943
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

I thought we're moving away from byte block pooling and we're
going to try relying on garbage collection? Does a volatile
object[] publish changes to all threads? Probably not, again
it'd just be the pointer.

In the case of posting/termdocs iteration, I'm more concerned
that the lastDocID be volatile than the with the byte array
containing extra data. Extra docs is OK in the byte array
because we'll simply stop iterating when we've reached the last
doc. Though with our system, we shouldn't even run into this
either, meaning a byte array is copied and published, perhaps
the master byte array is still being written to and the same
byte array (by id or something) is published again? Then we'd
have multiple versions of byte arrays. That could be bad.

Because there is one DW per thread, there's only one document
being indexed at a time. There's no writer concurrency. This
leaves reader concurrency. However after each doc, we *could*
simply flush all bytes related to the doc. Any new docs must
simply start writing to new byte arrays? The problem with this
is, unless the byte arrays are really small, we'll have a lot of
extra data around, well, unless the byte arrays are trimmed
before publication. Or we can simply RW lock (or some other
analogous thing) individual byte arrays, not publish them after
each doc, then only publish them when get reader is called. To
clarify, the RW lock (or flag) would only be per byte array, in
fact, all writing to the byte array could necessarily cease on
flush, and new byte arrays allocated. The published byte array
could point to the next byte array. 

I think we simply need a way to publish byte arrays to all
threads? Michael B. can you post something of what you have so
we can get an idea of how your system will work (ie, mainly what
the assumptions are)? 

We do need to strive for correctness of data, and perhaps
performance will be slightly impacted (though compared with our
current NRT we'll have an overall win). 

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-16 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845939#action_12845939
 ] 

Chris Male commented on LUCENE-2310:


{quote}
So overall we agree on the changes that need to be made. BTW, when you 
deprecate a method, you usually change it to call the new API or change it to 
use the new data structures or whatever. So we need to think how to impl 
getFields such that if one calls remove, numFields or use the iterator on an 
interleving manner, his code doesn't break ... I don't think it should be hard 
but it might be a good idea to even write such (deprecated) unit test
{quote}

I'm not sure we have to change getFields.  We can just deprecate it, and point 
people to the new methods.  I think it'd be more effort than its worth to 
create a List impl that calls the new methods.  Was that what you were 
implying?  I do agree its worth writing a test to ensure all old functionality 
can be done via the new methods somehow.

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-16 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845936#action_12845936
 ] 

Shai Erera commented on LUCENE-2310:


I'm sorry for the confusion - I got used to all the deprecation discussions so 
much that it's embedded in my replies :) - when I wrote "instead getFields" I 
meant that it will be deprecated, and we'll carry it w/ us until 4.0 is out.

So overall we agree on the changes that need to be made. BTW, when you 
deprecate a method, you usually change it to call the new API or change it to 
use the new data structures or whatever. So we need to think how to impl 
getFields such that if one calls remove, numFields or use the iterator on an 
interleving manner, his code doesn't break ... I don't think it should be hard 
but it might be a good idea to even write such (deprecated) unit test

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance

2010-03-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845934#action_12845934
 ] 

Robert Muir commented on LUCENE-2098:
-

I think the best way to proceed would be to make it easy to benchmark 
CharFilters in contrib/benchmark, especially this HTML stripping one.

Honestly we don't even know for sure any performance degradation reported
in the original link is really due to BaseCharFilter yet, so I think we need
to benchmark and profile.


> make BaseCharFilter more efficient in performance
> -
>
> Key: LUCENE-2098
> URL: https://issues.apache.org/jira/browse/LUCENE-2098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-2098.patch
>
>
> Performance degradation in Solr 1.4 was reported. See:
> http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4
> The inefficiency has been pointed out in BaseCharFilter javadoc by Mike:
> {panel}
> NOTE: This class is not particularly efficient. For example, a new class 
> instance is created for every call to addOffCorrectMap(int, int), which is 
> then appended to a private list. 
> {panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2320) Add MergePolicy to IndexWriterConfig

2010-03-16 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845930#action_12845930
 ] 

Shai Erera commented on LUCENE-2320:


But it's MP which requires IW. So how will your policeman (like the name :)) 
proposal prevent it? I think that setting IW on MP is not such a bad thing. If 
MP needs it then it needs. The question now is to what length do we want to go 
w/ it: make it sort of final (in which case SetOnce makes sense) or settle w/ a 
setIW which is simpler.

This issue is more about moving MP into IWC than refactor MP. I'd like to keep 
it focused on that as much as possible. I don't mean that we should stop 
discussing the refactoring, just to say it can be done separately. After MP 
moves to IWC and all code is converted to use the new API, refactoring MP 
internally should not affect the API level, right?

If u agree w/ that, then how do u propose to continue? W/ SetOnce or a simple 
setter?  

> Add MergePolicy to IndexWriterConfig
> 
>
> Key: LUCENE-2320
> URL: https://issues.apache.org/jira/browse/LUCENE-2320
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Assignee: Michael McCandless
> Fix For: 3.1
>
> Attachments: LUCENE-2320.patch
>
>
> Now that IndexWriterConfig is in place, I'd like to move MergePolicy to it as 
> well. The change is not straightforward and so I've kept it for a separate 
> issue. MergePolicy requires in its ctor an IndexWriter, however none can be 
> passed to it before an IndexWriter actually exists. And today IW may create 
> an MP just for it to be overridden by the application one line afterwards. I 
> don't want to make iw member of MP non-final, or settable by extending 
> classes, however it needs to remain protected so they can access it directly. 
> So the proposed changes are:
> * Add a SetOnce object (to o.a.l.util), or Immutable, which can only be set 
> once (hence its name). It'll have the signature SetOnce w/ *synchronized 
> set* and *T get()*. T will be declared volatile, so that get() won't be 
> synchronized.
> * MP will define a *protected final SetOnce writer* instead of 
> the current writer. *NOTE: this is a bw break*. any suggestions are welcomed.
> * MP will offer a public default ctor, together with a set(IndexWriter).
> * IndexWriter will set itself on MP using set(this). Note that if set will be 
> called more than once, it will throw an exception (AlreadySetException - or 
> does someone have a better suggestion, preferably an already existing Java 
> exception?).
> That's the core idea. I'd like to post a patch soon, so I'd appreciate your 
> review and proposals.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance

2010-03-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845919#action_12845919
 ] 

Michael McCandless commented on LUCENE-2098:


bq. I think this is why it got slower with my patch, in practice it didn't 
matter that this thing did 'backwards linear lookup' due to this reason?

Ahh yes since presumably the test was simply looking up the offsets for the 
current token...

> make BaseCharFilter more efficient in performance
> -
>
> Key: LUCENE-2098
> URL: https://issues.apache.org/jira/browse/LUCENE-2098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-2098.patch
>
>
> Performance degradation in Solr 1.4 was reported. See:
> http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4
> The inefficiency has been pointed out in BaseCharFilter javadoc by Mike:
> {panel}
> NOTE: This class is not particularly efficient. For example, a new class 
> instance is created for every call to addOffCorrectMap(int, int), which is 
> then appended to a private list. 
> {panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1488) multilingual analyzer based on icu

2010-03-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845917#action_12845917
 ] 

Uwe Schindler commented on LUCENE-1488:
---

Attribute looks good! I would only fix toString() to match the defaulkt impl by 
using syntax variableName + "=" + value, here  "code="+getName(code). This 
makes AttrubuteSource.toString() look nice.

> multilingual analyzer based on icu
> --
>
> Key: LUCENE-1488
> URL: https://issues.apache.org/jira/browse/LUCENE-1488
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: ICUAnalyzer.patch, LUCENE-1488.patch, LUCENE-1488.patch, 
> LUCENE-1488.patch, LUCENE-1488.patch, LUCENE-1488.txt, LUCENE-1488.txt
>
>
> The standard analyzer in lucene is not exactly unicode-friendly with regards 
> to breaking text into words, especially with respect to non-alphabetic 
> scripts.  This is because it is unaware of unicode bounds properties.
> I actually couldn't figure out how the Thai analyzer could possibly be 
> working until i looked at the jflex rules and saw that codepoint range for 
> most of the Thai block was added to the alphanum specification. defining the 
> exact codepoint ranges like this for every language could help with the 
> problem but you'd basically be reimplementing the bounds properties already 
> stated in the unicode standard. 
> in general it looks like this kind of behavior is bad in lucene for even 
> latin, for instance, the analyzer will break words around accent marks in 
> decomposed form. While most latin letter + accent combinations have composed 
> forms in unicode, some do not. (this is also an issue for asciifoldingfilter 
> i suppose). 
> I've got a partially tested standardanalyzer that uses icu Rule-based 
> BreakIterator instead of jflex. Using this method you can define word 
> boundaries according to the unicode bounds properties. After getting it into 
> some good shape i'd be happy to contribute it for contrib but I wonder if 
> theres a better solution so that out of box lucene will be more friendly to 
> non-ASCII text. Unfortunately it seems jflex does not support use of these 
> properties such as [\p{Word_Break = Extend}] so this is probably the major 
> barrier.
> Thanks,
> Robert

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Yonik Seeley
On Tue, Mar 16, 2010 at 5:42 AM, Michael McCandless
 wrote:
> I think it like the 1st option best (lucene moves as subdir to solr's
> current trunk SVN path), but I don't feel strongly.
>
> This'd mean one could simply checkout lucene alone and do everything
> you can do today.
>
> But if you check out solr, you also get a full checkout of lucene, and
> solr's build.xml will go and build lucene, copy over its jars to its
> lib folder, and then do everything it currently does.
>
> I think?
>
> This small step is not much change over what we have today -- the code
> simply moves, unchanged, except for some fixes to solr's build.xml to
> go and build its lucene subdir first.

Huh - I was leaning more toward putting solr under lucene because I
thought that might be more acceptable to the lucene folks (actually,
now lucene/solr folks) than vice-versa.

But your points make perfect sense.

> The bigger stuff, ideas on modules like renaming contrib->modules,
> consolidating all analyzers, queries, queryparsers, highlighters, all
> comes later.

+1

-Yonik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1488) multilingual analyzer based on icu

2010-03-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1488:


Attachment: LUCENE-1488.patch

uploading a dump of my workspace, so Uwe can review the new attribute.

> multilingual analyzer based on icu
> --
>
> Key: LUCENE-1488
> URL: https://issues.apache.org/jira/browse/LUCENE-1488
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: ICUAnalyzer.patch, LUCENE-1488.patch, LUCENE-1488.patch, 
> LUCENE-1488.patch, LUCENE-1488.patch, LUCENE-1488.txt, LUCENE-1488.txt
>
>
> The standard analyzer in lucene is not exactly unicode-friendly with regards 
> to breaking text into words, especially with respect to non-alphabetic 
> scripts.  This is because it is unaware of unicode bounds properties.
> I actually couldn't figure out how the Thai analyzer could possibly be 
> working until i looked at the jflex rules and saw that codepoint range for 
> most of the Thai block was added to the alphanum specification. defining the 
> exact codepoint ranges like this for every language could help with the 
> problem but you'd basically be reimplementing the bounds properties already 
> stated in the unicode standard. 
> in general it looks like this kind of behavior is bad in lucene for even 
> latin, for instance, the analyzer will break words around accent marks in 
> decomposed form. While most latin letter + accent combinations have composed 
> forms in unicode, some do not. (this is also an issue for asciifoldingfilter 
> i suppose). 
> I've got a partially tested standardanalyzer that uses icu Rule-based 
> BreakIterator instead of jflex. Using this method you can define word 
> boundaries according to the unicode bounds properties. After getting it into 
> some good shape i'd be happy to contribute it for contrib but I wonder if 
> theres a better solution so that out of box lucene will be more friendly to 
> non-ASCII text. Unfortunately it seems jflex does not support use of these 
> properties such as [\p{Word_Break = Extend}] so this is probably the major 
> barrier.
> Thanks,
> Robert

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Mark Miller

On 03/16/2010 10:09 AM, Yonik Seeley wrote:

On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch  wrote:
   

Also, we're in review-and-commit process, not commit-and-review.  Changes have 
to be
proposed, discussed and ideally attached to jira as patches first.
 

Correction, just for the sake of avoiding future confusion (i.e. I'm
not making any point about this thread):

Lucene and Solr have always officially been CTR.
For trunk, we normally use a bit of informal lazy consensus for
anything big, hard, or that might be controvertial... but we are not
officially RTC.

-Yonik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

   


In any case, this is a branch. People really want to enforce RTC on a 
branch??? Even if that was our official process on trunk (which I agree 
it has not been) that's not how the flex branch worked. That's not how 
the solr_cloud branch worked. That's not how other previous branches 
have worked.


IMO - anyone should be able to create a branch for anything - to play 
around with whatever they want. We should encourage this. Branches are 
good. And they take up little space.



Branch changes have to be proposed, discussed, and attached to JIRA? 
Uggg - I certainly hope not.


Branches should be considered replacements for huge unwieldy patches. Do 
I have to propose and discuss before I put up a patch?


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Yonik Seeley
On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch  wrote:
> Also, we're in review-and-commit process, not commit-and-review.  Changes 
> have to be
> proposed, discussed and ideally attached to jira as patches first.

Correction, just for the sake of avoiding future confusion (i.e. I'm
not making any point about this thread):

Lucene and Solr have always officially been CTR.
For trunk, we normally use a bit of informal lazy consensus for
anything big, hard, or that might be controvertial... but we are not
officially RTC.

-Yonik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Mark Miller

On 03/16/2010 09:05 AM, Andrzej Bialecki wrote:

On 2010-03-16 12:29, Mark Miller wrote:


 From our perspective, we would have been just as happy with a branch on
my local hard drive! That would have taken longer to setup though.


You could have used git instead. There is a good integration between 
git and svn, and it's much easier (a giant understatement...) to 
handle branching and merging in git, both between git branches and 
syncing with external svn.


Yeah, we have actually discussed doing things like GIT in the past - 
prob main reason we didn't is learning curve at the moment. I haven't 
used it yet.


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Andrzej Bialecki

On 2010-03-16 12:29, Mark Miller wrote:


 From our perspective, we would have been just as happy with a branch on
my local hard drive! That would have taken longer to setup though.


You could have used git instead. There is a good integration between git 
and svn, and it's much easier (a giant understatement...) to handle 
branching and merging in git, both between git branches and syncing with 
external svn.


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Erick Erickson
My snap impression is that moving lucene to a sub-tree
under SOLR would introduce some confusion in the minds
of new folks looking at the code. *We* all know that Lucene
stands by itself, but putting it under a solr makes that less
obvious. I claim that there would be questions like "so can
I just use Lucene without SOLR?".

That said, the questions about release management, branching,
tagging, etc. take complete precedence over minor
confusion when the answer is "just go to directory X and
checkout if you want Lucene only".

FWIW
Erick



On Tue, Mar 16, 2010 at 8:30 AM, Robert Muir  wrote:

> On Tue, Mar 16, 2010 at 3:43 AM, Simon Willnauer
>  wrote:
>
> > One more thing which I wonder about even more is that this whole
> > merging happens so quickly for reasons I don't see right now. I don't
> > want to keep anybody from making progress but it appears like a rush
> > to me.
>
>
> By the way, the serious changes we applied to the branch, most of them
> have been sitting in JIRA over 3 months not doing much: SOLR-1659
>
> if you follow the linked issues, you can see all the stuff that got
> put in the branch... the branch was helpful for me, as I could help
> Mark with the "ton of little things", like TokenStreams embedded
> inside JSP files :)
>
> As its just a branch, if you want to go look at those patches
> (especially anything I did) and provide technical feedback, that would
> be great!
>
> But I think its a mistake to say things are rushed when the work has
> been done for months.
>
> --
> Robert Muir
> rcm...@gmail.com
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>


Re: lucene and solr trunk

2010-03-16 Thread Grant Ingersoll

On Mar 16, 2010, at 3:51 AM, Michael Busch wrote:

> On 3/16/10 12:43 AM, Simon Willnauer wrote:Me too.  I don't have the time to 
> follow IRC in addition to jira and mailinglists.  I know I've been missing 
> stuff, because in the past I commented on jira issues and later was told that 
> my questions were already discussed thoroughly on IRC.  I've also seen jira 
> issues that start with something like "Summary of IRC discussion:".

I too am troubled by the likes of this and have been feeling much the same way, 
as many already know.  It is on my list of things to discuss with the 
community, but I was going to wait a week or so to send, to let the volume die 
down a bit.

-Grant
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance

2010-03-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845887#action_12845887
 ] 

Robert Muir commented on LUCENE-2098:
-

Mark did some quick tests and this patch only seems to make things slower.

bq. Really most apps do not need all positions stored, ie, they only need to 
see typically the current token.

I think this is why it got slower with my patch, in practice it didn't matter 
that this thing did 'backwards linear lookup' due to this reason?

> make BaseCharFilter more efficient in performance
> -
>
> Key: LUCENE-2098
> URL: https://issues.apache.org/jira/browse/LUCENE-2098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-2098.patch
>
>
> Performance degradation in Solr 1.4 was reported. See:
> http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4
> The inefficiency has been pointed out in BaseCharFilter javadoc by Mike:
> {panel}
> NOTE: This class is not particularly efficient. For example, a new class 
> instance is created for every call to addOffCorrectMap(int, int), which is 
> then appended to a private list. 
> {panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Robert Muir
On Tue, Mar 16, 2010 at 3:43 AM, Simon Willnauer
 wrote:

> One more thing which I wonder about even more is that this whole
> merging happens so quickly for reasons I don't see right now. I don't
> want to keep anybody from making progress but it appears like a rush
> to me.


By the way, the serious changes we applied to the branch, most of them
have been sitting in JIRA over 3 months not doing much: SOLR-1659

if you follow the linked issues, you can see all the stuff that got
put in the branch... the branch was helpful for me, as I could help
Mark with the "ton of little things", like TokenStreams embedded
inside JSP files :)

As its just a branch, if you want to go look at those patches
(especially anything I did) and provide technical feedback, that would
be great!

But I think its a mistake to say things are rushed when the work has
been done for months.

-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Mark Miller

On 03/16/2010 07:05 AM, Shalin Shekhar Mangar wrote:


Wow, you guys are moving fast! Thats a good thing.

IRC is fine if you want to discuss something quickly. But it has its 
limitations. For example, I cannot follow IRC most of the times 
because I'm in a different time zone. But I don't want to stop anyone 
either. In fact, I can't do that. Nobody can.


All I want to say is that once discussions have happened and a plan 
agreed upon, it may be a good idea to let solr-dev/java-dev know the 
plan. In this case I didn't know a new branch was created until I saw 
was a commit notification and then Yonik's email.




Hi Shalin - I like your attitude ;)

-

Yonik's email was the notification of the plan :) Though we had no plan. 
When Robert and I made the branch we had no plan really - we just needed 
a place to put together our patches and do the final work. We were 
trying to do it with patches, but it was becoming difficult. But when we 
started we had no real plan - just to see if we could get Solr up and 
running on Lucene 3.01 and then trunk. Anything beyond that, we have not 
planned for - and before that was even completed, there were emails to 
java-dev about it. But we conceived nothing beyond seeing if we could 
get Solr running on the latest Lucene.


From our perspective, we would have been just as happy with a branch on 
my local hard drive! That would have taken longer to setup though.


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Shalin Shekhar Mangar
On Tue, Mar 16, 2010 at 3:44 PM, Mark Miller  wrote:

> On 03/16/2010 03:43 AM, Simon Willnauer wrote:
>
>>
>> One more thing which I wonder about even more is that this whole
>> merging happens so quickly for reasons I don't see right now. I don't
>> want to keep anybody from making progress but it appears like a rush
>> to me.
>>
>>
>
> Meh - I think your just plain wrong about this. Anyone can work as fast as
> they want on anything. Nothing has happened faster than the community wants
> yet. Your too concerned. This is called discussion. Nothing has happened. In
> my opinion, the whole freak out of what goes where in svn was so over blown
> - its so easy to move this stuff around at the drop of a hat. That's why it
> was suggested we put a branch there and no one saw anything wrong it with
> for the moment - everyone said, well we can just easily move it if someone
> has an issue - which we did. Didn't expect the freak out though. Frankly, we
> were just seeking a branch really, and didn't care where it went.
>
> Some of us are anxious to do some work - some of us are anxious to merge
> some code - no one is forcing this stuff on the others at a rapid pace -
> everyone gets there say as always. This is why we wanted a branch we could
> committ what we wanted to. SVN locations make starting the merge of code
> easier. They are easy to change. This is not like rushing index format
> changes. Its src code location - it can be moved at the drop of the hat. The
> sooner we resolve what we are going to do, the sooner we can start getting
> more work done that we hoped to get down with this merge. This thread starts
> that discussion. You can't start a discussion to early. Perhaps it leads to
> another discussion first, but their is no such thing as rushing the start of
> discussion. It doesn't say "figure it out by tomorrow, cause we are doing
> this tomorrow. " It doesn't say, figure this out by next week, because we
> are doing this next week. It says lets discuss where this is going to go.
>
> I think some people just need to relax, and discuss what they would like to
> see and worry less about how fast others are working. Fast work is good. It
> means more work. Nothing is going to happen until the community figures
> things out.
>
>

>  BTW: I still have the impression that if I don't follow IRC constantly
>> I'm missing important things.
>>
>>
> That's your impression then. Follow IRC if you want. People talk all over
> the places about Lucen/Solr - many times in places you can't follow - if it
> didn't happen on the list, it didn't happen. Michael Busch follows up
> saying, "people say it was discussed thoroughly on IRC" - so what? It
> doesn't count as a valid point of reference - I haven't seen that, but you
> can just tell someone that says that so - they owe you an explanation.
>
>
Wow, you guys are moving fast! Thats a good thing.

IRC is fine if you want to discuss something quickly. But it has its
limitations. For example, I cannot follow IRC most of the times because I'm
in a different time zone. But I don't want to stop anyone either. In fact, I
can't do that. Nobody can.

All I want to say is that once discussions have happened and a plan agreed
upon, it may be a good idea to let solr-dev/java-dev know the plan. In this
case I didn't know a new branch was created until I saw was a commit
notification and then Yonik's email.

-- 
Regards,
Shalin Shekhar Mangar.


Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
I think it like the 1st option best (lucene moves as subdir to solr's
current trunk SVN path), but I don't feel strongly.

This'd mean one could simply checkout lucene alone and do everything
you can do today.

But if you check out solr, you also get a full checkout of lucene, and
solr's build.xml will go and build lucene, copy over its jars to its
lib folder, and then do everything it currently does.

I think?

This small step is not much change over what we have today -- the code
simply moves, unchanged, except for some fixes to solr's build.xml to
go and build its lucene subdir first.

The bigger stuff, ideas on modules like renaming contrib->modules,
consolidating all analyzers, queries, queryparsers, highlighters, all
comes later.

Mike

On Mon, Mar 15, 2010 at 10:28 PM, Yonik Seeley  wrote:
> Due to a tremendous amount of work by our newly merged committer
> corps, the get-on-lucene-trunk branch (branches/solr) is ready for
> prime-time as the new solr trunk!  Lucene and Solr need to move to a
> common trunk for a host of reasons, including single patches that can
> cover both, shared tags and branches, and shared test code w/o a test
> jar.
>
> The current Lucene trunk is: .../lucene/java/trunk
> The current Solr trunk is: .../lucene/solr/trunk
>
> So, we have a few options on where to put Solr's new trunk:
>
> Lucene moves to Solr's trunk:
>  /solr/trunk, /solr/trunk/lucene
>
> Solr moves to Lucene's trunk:
>  /java/trunk, /java/trunk/solr
>
> Both projects move to a new trunk:
>  /something/trunk/java, /something/trunk/solr
>
> -Yonik
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Mark Miller

On 03/16/2010 03:43 AM, Simon Willnauer wrote:


One more thing which I wonder about even more is that this whole
merging happens so quickly for reasons I don't see right now. I don't
want to keep anybody from making progress but it appears like a rush
to me.
   


Meh - I think your just plain wrong about this. Anyone can work as fast 
as they want on anything. Nothing has happened faster than the community 
wants yet. Your too concerned. This is called discussion. Nothing has 
happened. In my opinion, the whole freak out of what goes where in svn 
was so over blown - its so easy to move this stuff around at the drop of 
a hat. That's why it was suggested we put a branch there and no one saw 
anything wrong it with for the moment - everyone said, well we can just 
easily move it if someone has an issue - which we did. Didn't expect the 
freak out though. Frankly, we were just seeking a branch really, and 
didn't care where it went.


Some of us are anxious to do some work - some of us are anxious to merge 
some code - no one is forcing this stuff on the others at a rapid pace - 
everyone gets there say as always. This is why we wanted a branch we 
could committ what we wanted to. SVN locations make starting the merge 
of code easier. They are easy to change. This is not like rushing index 
format changes. Its src code location - it can be moved at the drop of 
the hat. The sooner we resolve what we are going to do, the sooner we 
can start getting more work done that we hoped to get down with this 
merge. This thread starts that discussion. You can't start a discussion 
to early. Perhaps it leads to another discussion first, but their is no 
such thing as rushing the start of discussion. It doesn't say "figure it 
out by tomorrow, cause we are doing this tomorrow. " It doesn't say, 
figure this out by next week, because we are doing this next week. It 
says lets discuss where this is going to go.


I think some people just need to relax, and discuss what they would like 
to see and worry less about how fast others are working. Fast work is 
good. It means more work. Nothing is going to happen until the community 
figures things out.



BTW: I still have the impression that if I don't follow IRC constantly
I'm missing important things.
   
That's your impression then. Follow IRC if you want. People talk all 
over the places about Lucen/Solr - many times in places you can't follow 
- if it didn't happen on the list, it didn't happen. Michael Busch 
follows up saying, "people say it was discussed thoroughly on IRC" - so 
what? It doesn't count as a valid point of reference - I haven't seen 
that, but you can just tell someone that says that so - they owe you an 
explanation.



--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance

2010-03-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845788#action_12845788
 ] 

Michael McCandless commented on LUCENE-2098:


Ahh ok.

Probably we should switch to parallel arrays here, to make it very fast... yes 
this will consume RAM (8 bytes per position, if we keep all of them).

Really most apps do not need all positions stored, ie, they only need to see 
typically the current token.  So maybe we could make a filter that takes a 
"lookbehind size" and it'd only keep that number of mappings cached?  That'd 
have to be > the max size of any token you may analyze, so hard to bound 
perfectly, but eg setting this to the max allowed token in IndexWriter would 
guarantee that we'd never have a miss?

For analyzers that buffer tokens... they'd have to set this max to infinity, 
or, ensure they remap the offsets before capturing the token's state?

> make BaseCharFilter more efficient in performance
> -
>
> Key: LUCENE-2098
> URL: https://issues.apache.org/jira/browse/LUCENE-2098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-2098.patch
>
>
> Performance degradation in Solr 1.4 was reported. See:
> http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4
> The inefficiency has been pointed out in BaseCharFilter javadoc by Mike:
> {panel}
> NOTE: This class is not particularly efficient. For example, a new class 
> instance is created for every call to addOffCorrectMap(int, int), which is 
> then appended to a private list. 
> {panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Michael McCandless
On Tue, Mar 16, 2010 at 2:51 AM, Michael Busch  wrote:
> On 3/16/10 12:43 AM, Simon Willnauer wrote:
>>
>> If my impression should be wrong or if I miss something please ignore
>> the last paragraph.
>
> I feel exactly like you, Simon.  I don't understand the rush.  Also, we're
> in review-and-commit process, not commit-and-review.  Changes have to be
> proposed, discussed and ideally attached to jira as patches first.

There's obviously alot of excitement driving the progress here, and
there's been awesome progress.  Things are moving fast, but...

Remember that all commits/fast iterations are being done on a branch,
so that people involved can make fast progress.

When we land that branch onto trunk, there will be the usual scrutiny
("review then commit") of the changes that're going in, and this email
was started to get the most important topic ("where does all this
land, anyway") going, first.

EG changes like a move to Java 1.6, disallowing compression in Solr's
schema.xml, the Version changes percolating into Solr, all obviously
need sizable review & discussion...

>> BTW: I still have the impression that if I don't follow IRC constantly
>> I'm missing important things.
>
> Me too.  I don't have the time to follow IRC in addition to jira and
> mailinglists.  I know I've been missing stuff, because in the past I
> commented on jira issues and later was told that my questions were already
> discussed thoroughly on IRC.  I've also seen jira issues that start with
> something like "Summary of IRC discussion:".

This is a hard problem...

IRC is a very good tool to enable those that have the time (and I
agree it's ALOT OF TIME -- I can't keep up with it either) to work
together.  Fast design discussions are a powerful way to bat around
random ideas, and I'd say IRC has already produced a number of good
ideas for improving Lucene (opened as issues, lately...).

But the thing to remember is of all the crazy discussions that happen
on IRC (and there are MANY that don't pan out), when a "real" idea
pans out, it must then go through the normal process -- turn into an
issue, comments are added summarizing the pros/cons that were
discussed on IRC, a patch is created and must be reviewed, iterated,
and then committed.  The CTR process is still intact... it's just that
IRC is a faster way for some devs to discuss things that may turn into
real ideas (or, may get dropped on the floor).

Does anyone know how other projects fold in IRC...?

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance

2010-03-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845785#action_12845785
 ] 

Uwe Schindler commented on LUCENE-2098:
---

bq. Why did this cause Solr to slowdown...? Did Solr previously have a more 
efficient impl and then they cutover to Lucene's? 

Solr used another Filter in 1.3.

> make BaseCharFilter more efficient in performance
> -
>
> Key: LUCENE-2098
> URL: https://issues.apache.org/jira/browse/LUCENE-2098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-2098.patch
>
>
> Performance degradation in Solr 1.4 was reported. See:
> http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4
> The inefficiency has been pointed out in BaseCharFilter javadoc by Mike:
> {panel}
> NOTE: This class is not particularly efficient. For example, a new class 
> instance is created for every call to addOffCorrectMap(int, int), which is 
> then appended to a private list. 
> {panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-03-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845780#action_12845780
 ] 

Michael McCandless commented on LUCENE-2312:


bq. In thinking about the terms dictionary, we're going to run into concurrency 
issues right if we just use TreeMap? 

Right, we need a concurrent data structure here.  It's OK if there've been 
changes to this shared data structure since a reader was opened -- that reader 
knows its max doc id and so it can skip a term if the first doc id in that term 
is > that max.

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-03-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845778#action_12845778
 ] 

Michael McCandless commented on LUCENE-2312:


{quote}
The prototype I'm experimenting with has a fixed length postings format for the 
in-memory representation (in TermsHash). Basically every posting has 4 bytes, 
so I can use int[] arrays (instead of the byte[] pools). The first 3 bytes are 
used for an absolute docID (not delta-encoded). This limits the max in-memory 
segment size to 2^24 docs. The 1 remaining byte is used for the position. With 
a max doc length of 140 characters you can fit every possible position in a 
byte - what a luxury!  If a term occurs multiple times in the same doc, then 
the TermDocs just skips multiple occurrences with the same docID and increments 
the freq. Again, the same term doesn't occur often in super short docs.

The int[] slices also don't have forward pointers, like in Lucene's TermsHash, 
but backwards pointers. In real-time search you often want a strongly 
time-biased ranking. A PostingList object has a pointer that points to the last 
posting (this statement is not 100% correct for visibility reasons across 
threads, but we can imagine it this way for now). A TermDocs can now traverse 
the postinglists in opposite order. Skipping can be done by following pointers 
to previous slices directly, or by binary search within a slice.
{quote}
This sounds nice!

This would be a custom indexing chain for docs guaranteed not to be over 255 
positions in length right?

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-03-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845777#action_12845777
 ] 

Michael McCandless commented on LUCENE-2312:


bq. The tricky part is to make sure that a reader always sees a consistent 
snapshot of the index. At the same time a reader must not follow pointers to 
non-published locations (e.g. array blocks).

Right, I'm just not familiar specifically with what JMM says about one thread 
writing to a byte[] and another thread reading it.

In general, for our usage, the reader threads will never read into an area that 
has not yet been written to.  So that works in our favor (they can't cache 
those bytes if they didn't read them).  EXCEPT the CPU will have loaded the 
bytes on a word boundary and so if our reader thread reads only 1 byte, and no 
more (because this is now the end of the posting), the CPU may very well have 
pulled in the following 7 bytes (for example) and then illegally (according to 
our needs) cache them.

We better make some serious tests for this... including reader threads that 
just enum the postings for a single rarish term over and over while writer 
threads are indexing docs that occasionally have that term.  I think that's the 
worst case for JMM violation since the #bytes cached is small.

It's too bad there isn't higher level control on the CPU caching via java.  EG, 
in our usage, if we could call a System.flushCPUCache whenever a thread enters 
a newly reopened reader because, when accessing postings via a given Reader 
we want point-in-time searching anyway and so any bytes cached by the CPU are 
perfectly fine.  We only need CPU cache flush when a reader is reopened

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2098) make BaseCharFilter more efficient in performance

2010-03-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845776#action_12845776
 ] 

Michael McCandless commented on LUCENE-2098:


Patch looks like it should be a good net/net improvement -- lookups of the 
offset correction should now be fast (though insertion cost is probably higher 
-- we create likely 3 new objects (2 ints, one TreeMap$Entry) per insert) but I 
expect that's a good tradeoff.

> make BaseCharFilter more efficient in performance
> -
>
> Key: LUCENE-2098
> URL: https://issues.apache.org/jira/browse/LUCENE-2098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-2098.patch
>
>
> Performance degradation in Solr 1.4 was reported. See:
> http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4
> The inefficiency has been pointed out in BaseCharFilter javadoc by Mike:
> {panel}
> NOTE: This class is not particularly efficient. For example, a new class 
> instance is created for every call to addOffCorrectMap(int, int), which is 
> then appended to a private list. 
> {panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2098) make BaseCharFilter more efficient in performance

2010-03-16 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2098:
---

Affects Version/s: (was: 2.9)
   3.1

Why did this cause Solr to slowdown...?  Did Solr previously have a more 
efficient impl and then they cutover to Lucene's?

> make BaseCharFilter more efficient in performance
> -
>
> Key: LUCENE-2098
> URL: https://issues.apache.org/jira/browse/LUCENE-2098
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-2098.patch
>
>
> Performance degradation in Solr 1.4 was reported. See:
> http://www.lucidimagination.com/search/document/43c4bdaf5c9ec98d/html_stripping_slower_in_solr_1_4
> The inefficiency has been pointed out in BaseCharFilter javadoc by Mike:
> {panel}
> NOTE: This class is not particularly efficient. For example, a new class 
> instance is created for every call to addOffCorrectMap(int, int), which is 
> then appended to a private list. 
> {panel}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-16 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845771#action_12845771
 ] 

Chris Male commented on LUCENE-2310:


Hi Shai,

{quote}
i like the idea of Document to implement Iterable, but how does that solve the 
case where someone wants to query how many fields a document has?
{quote}

It doesn't, but then I'd add a numFields() method maybe.  It seems like 
something with a small use case and so having it has a method on the side seems 
ideal.

{quote}
Will you still have getFields(), only now it will return an unmodifiable 
collection?
{quote}

Yes and no.  getFields will remain but with a modifiable list.  I will then 
deprecate the method and recommend people use the Iterable.  This gives 
everybody a chance to migrate during the 3.x versions.

{quote}
Maybe just do: (1) Doc implements Iterable and (2) Doc exposes numFIelds(), 
add(Field)?
{quote}

Yup lets do that.  Unfortunately getFields will remain until 4.0.

{quote}
About remove(field), I thought of a possible scenario though I still don't 
think it's interesting enough - suppose that you pass your Document through a 
processing pipeline/chain, each handler adds fields as metadata to the 
Document. For example, annotators. It might be that a field A exists, only for 
a handler down the chain to understand A's meaning and then replace it w/ A1 
and A2. For that you'll want to be able to move a field ... I guess we could 
add a remove method to Document, and if it'll be called while the fields are 
iterated on, a CME will be thrown, which is perfectly fine with me.
{quote}

With the idea of having remove(...) I am trying to foresee what people might be 
doing via getFields() and thus are not going to be able to do when its gone.  
We will have the ability to add and iterate, so having the functionality to 
remove seems to complete it.  I completely agree that if something happens and 
a CME is thrown, then that problem should be left to the user.

I think this clarifies this direction.  Document will be changed as follows:
- Document will become Iterable
- getFields() will be deprecated in favour of the Iterable
- numFields() will be added returning the number of fields
- remove(String) will be added allowing someone to remove Fields with the given 
name.  If a CME occurs, thats up to the user to handle.

Cheers Shai!



> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: lucene and solr trunk

2010-03-16 Thread Uwe Schindler
Hi,

> And Lucene is on Java 1.5 and should be compiled with an 1.5 compiler,
> where Solr seems to be on 1.6 since yesterday? (Yonik added something
> to common-build.xml). On my development system I have no Java 1.6
> installed at all as default build, I ever use Java 1.5 for building
> Lucene. If we merge that and have both on different JVMs the same
> problems like with 1.4/1.5 start. Developers use 1.6 methods because
> their compiler does not warn them. So everybody working on Lucene
> should at least have Java 1.5 compiler and try to compile his changes
> before committing. I do this (as I use 1.5 for developing), 1.6 on some
> of our servers.
> 
> So: If merge, keep both on Java 1.5 !!!

I changed common-build.xml in the new solr branch to Java 1.5 again, as there 
is currently no reason to change this and especially as it was not discussed 
anywhere.

Java 1.5 as base for both solr and lucene is better and the few features of 
Java 1.6 does not rectify to move up. I have my development area configured 
with Java 1.5 and I only develop Lucene in 1.5. I am then sure to not use the 
wrong methods when creating patches. You can still tell users to run with JRE 
1.6, but development should stay on 1.5 for now.

Uwe


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Michael Busch

On 3/16/10 12:43 AM, Simon Willnauer wrote:

If my impression should be wrong or if I miss something please ignore
the last paragraph.
   


I feel exactly like you, Simon.  I don't understand the rush.  Also, 
we're in review-and-commit process, not commit-and-review.  Changes have 
to be proposed, discussed and ideally attached to jira as patches first.



BTW: I still have the impression that if I don't follow IRC constantly
I'm missing important things.

   
Me too.  I don't have the time to follow IRC in addition to jira and 
mailinglists.  I know I've been missing stuff, because in the past I 
commented on jira issues and later was told that my questions were 
already discussed thoroughly on IRC.  I've also seen jira issues that 
start with something like "Summary of IRC discussion:".


 Michael

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene and solr trunk

2010-03-16 Thread Michael Busch
I completely agree with Uwe and Hoss.  These questions need to be 
addressed first.


I still want to be able to only checkout Lucene code and run the Lucene 
build independently from Solr.  And Lucene needs to be able to release 
without Solr and the branching/tagging needs to support that as Uwe 
points out.


 Michael

On 3/16/10 12:18 AM, Uwe Schindler wrote:

Hi all,

I don't want to be against all other developers that voted +1 for the SVN 
"merge", but I am not happy with it. Most importantly for the reasons Hoss 
mentioned:

   

: prime-time as the new solr trunk!  Lucene and Solr need to move to a
: common trunk for a host of reasons, including single patches that can
: cover both, shared tags and branches, and shared test code w/o a test
: jar.

Without a clearer picture of how people envision development "overhead"
working as we move forward, it's really hard to understand how any of
these ideas make sense...
   1) how should hte automated build process(es) work?
   2) how are we going to do branching/tagging for releases?
particularly
in situations where one product is ready for a rlease and hte other
isn't?
   3) how are we going to deal with mino bug fix release tagging?
   4) should it be possible for people to check out Lucene-Java w/o
checking out Solr?
 

That are important questions and not simply to solve!

   

(i suspect a whole lot of people who only care about the core library
are
going to really adamantly not want to have to check out all of Solr
just
to work on the core)
 

Exactly! The Solr checkout is really huge because of thousands of JAR files and 
so on. The badest thing we could do would be to merge all those JARs into one 
general lib folder or like so. Please do not! Lucene-core should stay a lib 
without any external deps.

   

: Both projects move to a new trunk:
:   /something/trunk/java, /something/trunk/solr
 

This would be the only optinon we have. This new folder could simply contain 
two dirs below and a build.xml in the top level that delegates and builds first 
lucene, then solr. But you can do this also with separate checkouts and a 
simple script downloaded from the wiki.

The problems of this approach far overweigh the positive side:

In the original vote, we said, Lucene can release without Solr:
Releasing (I was the last release mangaer) contains things like creating branches and tags. In SVN, 
if you create a branch, you copy everything from under trunk (or another branch) to a new folder 
below branches (for tags under tags). "tags" on most SVN servers has an additional 
limittation, that it is not possible to change anything under "tags" except copying.

If we have those combined trunk folder and Lucene wants to release and creates 
a branch/tag. Solr is enforced to do this too. OK, you could say, we just 
branch the folder lucene and let solr where it is. But that would be a against 
conventions and the branch checkout could not life alone.

I just repeat: we wanted to merge devs and not codebase! And merging devs is a "code 
change" clearly.

And Lucene is on Java 1.5 and should be compiled with an 1.5 compiler, where 
Solr seems to be on 1.6 since yesterday? (Yonik added something to 
common-build.xml). On my development system I have no Java 1.6 installed at all 
as default build, I ever use Java 1.5 for building Lucene. If we merge that and 
have both on different JVMs the same problems like with 1.4/1.5 start. 
Developers use 1.6 methods because their compiler does not warn them. So 
everybody working on Lucene should at least have Java 1.5 compiler and try to 
compile his changes before committing. I do this (as I use 1.5 for developing), 
1.6 on some of our servers.

So: If merge, keep both on Java 1.5 !!!

   

by gut says something like this will more the most sense, assuming
"/something/trunk" == "/java/trunk" and "java" actually means "core"
...
 

And that is how it looks currently and I am fine with it!

   

ie: this discussion should really be part and parcel with how contribs
should be reorged.
 

That is exactly what should be done. Not now simply copy the folders somewhere for some 
"development simplification" that not really is one and opens more problems!

I propose another idea for now until the "module" decision is [DISCUSS]ed and 
[VOTE]d:

Lets create a new project folder with trunk and branches for combined trunk 
development in SVN (this can be later the folder for the module development). 
This folder simply contains a delegating build.xml (delegating the common tasks 
like build and test and so on to solr and trunk).The folder simply uses 
svn:external SVN props to link current solr and lucene trunk as subfolders. So 
developers that want to work on both can simply checkout this folder and SVN 
will resolve the externals. As this is trunk development, the externals will be 
without rev numbers and relative for the http(s) problem (SVN 1.5+ required).

For testing flex, we create a branch of this folder

Re: lucene and solr trunk

2010-03-16 Thread Simon Willnauer
On Tue, Mar 16, 2010 at 8:18 AM, Uwe Schindler  wrote:
> Hi all,
>
> I don't want to be against all other developers that voted +1 for the SVN 
> "merge", but I am not happy with it. Most importantly for the reasons Hoss 
> mentioned:
>
>> : prime-time as the new solr trunk!  Lucene and Solr need to move to a
>> : common trunk for a host of reasons, including single patches that can
>> : cover both, shared tags and branches, and shared test code w/o a test
>> : jar.
>>
>> Without a clearer picture of how people envision development "overhead"
>> working as we move forward, it's really hard to understand how any of
>> these ideas make sense...
>>   1) how should hte automated build process(es) work?
>>   2) how are we going to do branching/tagging for releases?
>> particularly
>> in situations where one product is ready for a rlease and hte other
>> isn't?
>>   3) how are we going to deal with mino bug fix release tagging?
>>   4) should it be possible for people to check out Lucene-Java w/o
>> checking out Solr?
>
> That are important questions and not simply to solve!
>
>> (i suspect a whole lot of people who only care about the core library
>> are
>> going to really adamantly not want to have to check out all of Solr
>> just
>> to work on the core)
>
> Exactly! The Solr checkout is really huge because of thousands of JAR files 
> and so on. The badest thing we could do would be to merge all those JARs into 
> one general lib folder or like so. Please do not! Lucene-core should stay a 
> lib without any external deps.
>
>> : Both projects move to a new trunk:
>> :   /something/trunk/java, /something/trunk/solr
>
> This would be the only optinon we have. This new folder could simply contain 
> two dirs below and a build.xml in the top level that delegates and builds 
> first lucene, then solr. But you can do this also with separate checkouts and 
> a simple script downloaded from the wiki.
>
> The problems of this approach far overweigh the positive side:
>
> In the original vote, we said, Lucene can release without Solr:
> Releasing (I was the last release mangaer) contains things like creating 
> branches and tags. In SVN, if you create a branch, you copy everything from 
> under trunk (or another branch) to a new folder below branches (for tags 
> under tags). "tags" on most SVN servers has an additional limittation, that 
> it is not possible to change anything under "tags" except copying.
>
> If we have those combined trunk folder and Lucene wants to release and 
> creates a branch/tag. Solr is enforced to do this too. OK, you could say, we 
> just branch the folder lucene and let solr where it is. But that would be a 
> against conventions and the branch checkout could not life alone.
>
> I just repeat: we wanted to merge devs and not codebase! And merging devs is 
> a "code change" clearly.
>
> And Lucene is on Java 1.5 and should be compiled with an 1.5 compiler, where 
> Solr seems to be on 1.6 since yesterday? (Yonik added something to 
> common-build.xml). On my development system I have no Java 1.6 installed at 
> all as default build, I ever use Java 1.5 for building Lucene. If we merge 
> that and have both on different JVMs the same problems like with 1.4/1.5 
> start. Developers use 1.6 methods because their compiler does not warn them. 
> So everybody working on Lucene should at least have Java 1.5 compiler and try 
> to compile his changes before committing. I do this (as I use 1.5 for 
> developing), 1.6 on some of our servers.
>
> So: If merge, keep both on Java 1.5 !!!
>
>> by gut says something like this will more the most sense, assuming
>> "/something/trunk" == "/java/trunk" and "java" actually means "core"
>> ...
>
> And that is how it looks currently and I am fine with it!
>
>> ie: this discussion should really be part and parcel with how contribs
>> should be reorged.
>
> That is exactly what should be done. Not now simply copy the folders 
> somewhere for some "development simplification" that not really is one and 
> opens more problems!
>
> I propose another idea for now until the "module" decision is [DISCUSS]ed and 
> [VOTE]d:
>
> Lets create a new project folder with trunk and branches for combined trunk 
> development in SVN (this can be later the folder for the module development). 
> This folder simply contains a delegating build.xml (delegating the common 
> tasks like build and test and so on to solr and trunk).The folder simply uses 
> svn:external SVN props to link current solr and lucene trunk as subfolders. 
> So developers that want to work on both can simply checkout this folder and 
> SVN will resolve the externals. As this is trunk development, the externals 
> will be without rev numbers and relative for the http(s) problem (SVN 1.5+ 
> required).

+1 - as I recall correctly that is what uwe and I proposed initially
on IRC when solr got copied initially. This makes a lot of sense as it
does not break anybodies checkouts and enables all "Solcene"
developers to 

Re: [DISCUSS] Do away with Contrib Committers and make core committers

2010-03-16 Thread Simon Willnauer
On Mon, Mar 15, 2010 at 10:54 PM, Ryan McKinley  wrote:
>>
>> Personally I'd prefer we just stop adding them, and the current ones work
>> their way up like normal if they are so inclined, or the ones that are not
>> even around anymore can just stay as they are.
>>
That sounds reasonable to me too. Yet, we should still make sure
contrib committers are able to commit to the new "modules" or whatever
we going to decide where contrib stuff ends up. It seems to be odd if
I'd not be able to commit to the analyzers anymore because they have
moved out of contrib into something new.

simon
>
> This seems reasonable to me.
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: lucene and solr trunk

2010-03-16 Thread Uwe Schindler
Hi all,

I don't want to be against all other developers that voted +1 for the SVN 
"merge", but I am not happy with it. Most importantly for the reasons Hoss 
mentioned:

> : prime-time as the new solr trunk!  Lucene and Solr need to move to a
> : common trunk for a host of reasons, including single patches that can
> : cover both, shared tags and branches, and shared test code w/o a test
> : jar.
> 
> Without a clearer picture of how people envision development "overhead"
> working as we move forward, it's really hard to understand how any of
> these ideas make sense...
>   1) how should hte automated build process(es) work?
>   2) how are we going to do branching/tagging for releases?
> particularly
> in situations where one product is ready for a rlease and hte other
> isn't?
>   3) how are we going to deal with mino bug fix release tagging?
>   4) should it be possible for people to check out Lucene-Java w/o
> checking out Solr?

That are important questions and not simply to solve!

> (i suspect a whole lot of people who only care about the core library
> are
> going to really adamantly not want to have to check out all of Solr
> just
> to work on the core)

Exactly! The Solr checkout is really huge because of thousands of JAR files and 
so on. The badest thing we could do would be to merge all those JARs into one 
general lib folder or like so. Please do not! Lucene-core should stay a lib 
without any external deps.

> : Both projects move to a new trunk:
> :   /something/trunk/java, /something/trunk/solr

This would be the only optinon we have. This new folder could simply contain 
two dirs below and a build.xml in the top level that delegates and builds first 
lucene, then solr. But you can do this also with separate checkouts and a 
simple script downloaded from the wiki.

The problems of this approach far overweigh the positive side:

In the original vote, we said, Lucene can release without Solr:
Releasing (I was the last release mangaer) contains things like creating 
branches and tags. In SVN, if you create a branch, you copy everything from 
under trunk (or another branch) to a new folder below branches (for tags under 
tags). "tags" on most SVN servers has an additional limittation, that it is not 
possible to change anything under "tags" except copying.

If we have those combined trunk folder and Lucene wants to release and creates 
a branch/tag. Solr is enforced to do this too. OK, you could say, we just 
branch the folder lucene and let solr where it is. But that would be a against 
conventions and the branch checkout could not life alone.

I just repeat: we wanted to merge devs and not codebase! And merging devs is a 
"code change" clearly.

And Lucene is on Java 1.5 and should be compiled with an 1.5 compiler, where 
Solr seems to be on 1.6 since yesterday? (Yonik added something to 
common-build.xml). On my development system I have no Java 1.6 installed at all 
as default build, I ever use Java 1.5 for building Lucene. If we merge that and 
have both on different JVMs the same problems like with 1.4/1.5 start. 
Developers use 1.6 methods because their compiler does not warn them. So 
everybody working on Lucene should at least have Java 1.5 compiler and try to 
compile his changes before committing. I do this (as I use 1.5 for developing), 
1.6 on some of our servers.

So: If merge, keep both on Java 1.5 !!!

> by gut says something like this will more the most sense, assuming
> "/something/trunk" == "/java/trunk" and "java" actually means "core"
> ...

And that is how it looks currently and I am fine with it!

> ie: this discussion should really be part and parcel with how contribs
> should be reorged.

That is exactly what should be done. Not now simply copy the folders somewhere 
for some "development simplification" that not really is one and opens more 
problems!

I propose another idea for now until the "module" decision is [DISCUSS]ed and 
[VOTE]d:

Lets create a new project folder with trunk and branches for combined trunk 
development in SVN (this can be later the folder for the module development). 
This folder simply contains a delegating build.xml (delegating the common tasks 
like build and test and so on to solr and trunk).The folder simply uses 
svn:external SVN props to link current solr and lucene trunk as subfolders. So 
developers that want to work on both can simply checkout this folder and SVN 
will resolve the externals. As this is trunk development, the externals will be 
without rev numbers and relative for the http(s) problem (SVN 1.5+ required).

For testing flex, we create a branch of this folder, still pointing to 
solr-trunk, but flex branch in Lucene.

One task of the main build.xml would be to copy all produced JAR files of 
Lucene into the correct build folder in Solr.

I hope that you all understand me, but I am against merging trunks (for now) 
until we have a clear module decision.

Uwe


-