[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849076#action_12849076
 ] 

Michael McCandless commented on LUCENE-2328:


Woops, fixed, thanks!

> IndexWriter.synced  field accumulates data leading to a Memory Leak
> ---
>
> Key: LUCENE-2328
> URL: https://issues.apache.org/jira/browse/LUCENE-2328
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
> Environment: all
>Reporter: Gregor Kaczor
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
> LUCENE-2328.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I am running into a strange OutOfMemoryError. My small test application does
> index and delete some few files. This is repeated for 60k times. Optimization
> is run from every 2k times a file is indexed. Index size is 50KB. I did 
> analyze
> the HeapDumpFile and realized that IndexWriter.synced field occupied more than
> half of the heap. That field is a private HashSet without a getter. Its task 
> is
> to hold files which have been synced already.
> There are two calls to addAll and one call to add on synced but no remove or
> clear throughout the lifecycle of the IndexWriter instance.
> According to the Eclipse Memory Analyzer synced contains 32618 entries which
> look like file names "_e065_1.del" or "_e067.cfs"
> The index directory contains 10 files only.
> I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2252) stored field retrieve slow

2010-03-23 Thread John Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848972#action_12848972
 ] 

John Wang commented on LUCENE-2252:
---

Hi Mike:

 Sorry for the late reply. We have written something for this purpose: 

http://snaprojects.jira.com/wiki/display/KRTI/Krati+Performance+Evaluation

Thanks

-John

> stored field retrieve slow
> --
>
> Key: LUCENE-2252
> URL: https://issues.apache.org/jira/browse/LUCENE-2252
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Affects Versions: 3.0
>Reporter: John Wang
>
> IndexReader.document() on a stored field is rather slow. Did a simple 
> multi-threaded test and profiled it:
> 40+% time is spent in getting the offset from the index file
> 30+% time is spent in reading the count (e.g. number of fields to load)
> Although I ran it on my lap top where the disk isn't that great, but still 
> seems to be much room in improvement, e.g. load field index file into memory 
> (for a 5M doc index, the extra memory footprint is 20MB, peanuts comparing to 
> other stuff being loaded)
> A related note, are there plans to have custom segments as part of flexible 
> indexing feature?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848973#action_12848973
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

Mike, you missed latest patch, with Shai-requested comment:

{code}
@@ -85,6 +85,8 @@
* stable storage.  Lucene uses this to properly commit
* changes to the index, to prevent a machine/OS crash
* from corrupting the index.
+   * @deprecated use {...@link #sync(Collection)} instead.
+   * For easy migration you can change your code to call 
sync(Collections.singleton(name))
*/
   @Deprecated
   public void sync(String name) throws IOException { // TODO 4.0 kill me
{code}


> IndexWriter.synced  field accumulates data leading to a Memory Leak
> ---
>
> Key: LUCENE-2328
> URL: https://issues.apache.org/jira/browse/LUCENE-2328
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
> Environment: all
>Reporter: Gregor Kaczor
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
> LUCENE-2328.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I am running into a strange OutOfMemoryError. My small test application does
> index and delete some few files. This is repeated for 60k times. Optimization
> is run from every 2k times a file is indexed. Index size is 50KB. I did 
> analyze
> the HeapDumpFile and realized that IndexWriter.synced field occupied more than
> half of the heap. That field is a private HashSet without a getter. Its task 
> is
> to hold files which have been synced already.
> There are two calls to addAll and one call to add on synced but no remove or
> clear throughout the lifecycle of the IndexWriter instance.
> According to the Eclipse Memory Analyzer synced contains 32618 entries which
> look like file names "_e065_1.del" or "_e067.cfs"
> The index directory contains 10 files only.
> I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-23 Thread Marvin Humphrey
On Tue, Mar 23, 2010 at 01:30:42PM -0700, Otis Gospodnetic wrote:
> Archiving the logs feels like it would be useful, but realistically
> speaking, they would be pretty big and who has the time to read them after
> the fact?  

Someone who participated in the chat reviewing it while preparing a summary.

Marvin Humphrey

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Running the Solr/Lucene tests failed

2010-03-23 Thread Michael Busch

On 3/23/10 2:12 PM, Yonik Seeley wrote:

On Tue, Mar 23, 2010 at 5:07 PM, Michael Busch  wrote:
   

OK I reran the tests sequentially with my LUCENE-2329 patch applied.  The
same test failed again:

[junit] Test org.apache.solr.client.solrj.embedded.JettyWebappTest FAILED


Everything else looks good.  So it should be ok to commit 2329?
 

Yeah, of course!  Heavy committing going on in solr tests ;-)

   


OK, done!

 Michael


-Yonik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


   



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-23 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-2329.
---

Resolution: Fixed

Committed revision 926791.

> Use parallel arrays instead of PostingList objects
> --
>
> Key: LUCENE-2329
> URL: https://issues.apache.org/jira/browse/LUCENE-2329
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2329.patch, lucene-2329.patch, lucene-2329.patch
>
>
> This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324.
> In order to avoid having very many long-living PostingList objects in 
> TermsHashPerField we want to switch to parallel arrays.  The termsHash will 
> simply be a int[] which maps each term to dense termIDs.
> All data that the PostingList classes currently hold will then we placed in 
> parallel arrays, where the termID is the index into the arrays.  This will 
> avoid the need for object pooling, will remove the overhead of object 
> initialization and garbage collection.  Especially garbage collection should 
> benefit significantly when the JVM runs out of memory, because in such a 
> situation the gc mark times can get very long if there is a big number of 
> long-living objects in memory.
> Another benefit could be to build more efficient TermVectors.  We could avoid 
> the need of having to store the term string per document in the TermVector.  
> Instead we could just store the segment-wide termIDs.  This would reduce the 
> size and also make it easier to implement efficient algorithms that use 
> TermVectors, because no term mapping across documents in a segment would be 
> necessary.  Though this improvement we can make with a separate jira issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Running the Solr/Lucene tests failed

2010-03-23 Thread Yonik Seeley
On Tue, Mar 23, 2010 at 5:07 PM, Michael Busch  wrote:
> OK I reran the tests sequentially with my LUCENE-2329 patch applied.  The
> same test failed again:
>
> [junit] Test org.apache.solr.client.solrj.embedded.JettyWebappTest FAILED
>
>
> Everything else looks good.  So it should be ok to commit 2329?

Yeah, of course!  Heavy committing going on in solr tests ;-)

-Yonik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Running the Solr/Lucene tests failed

2010-03-23 Thread Mark Miller
If you do an update your issue should be resolved. This is something we 
ran into the other day as well, and have been solving it a bit at a time ;)


- Mark

On 03/23/2010 04:29 PM, Robert Muir wrote:

Yeah its a bit confusing... before, exceptions happening in other
threads were silently hidden.

Uwe fixed this in Lucene I think, and right now the verbosity is
cranked for Solr, too.
Yonik is hacking away at these tests to quiet the ones that are "truly
expected" exceptions...

At least I think I got this right...

On Tue, Mar 23, 2010 at 4:26 PM, Michael Busch  wrote:
   

I see.  And all the other exceptions printed are expected?

  Michael

On 3/23/10 1:20 PM, Robert Muir wrote:
 

Thanks Michael, this isn't a parallel test problem at all, its a
sporatic problem with solr's jetty tests (the same problem I mentioned
in the previous response).

You might/will see this problem running the tests sequentially too.

Test org.apache.solr.client.solrj.embedded.JettyWebappTest FAILED

On Tue, Mar 23, 2010 at 4:15 PM, Michael Buschwrote:

   

Sorry for the lack of details.  Thought I had just not done an obvious
step.
Attached is the output from the Solr part.

Btw: This machine is a Solr virgin,  Solr never ran on it before.

  Michael

On 3/23/10 1:00 PM, Mark Miller wrote:

 

Robert very recently committed some stuff that parallelizes the solr
tests
that may need to be worked out in all cases still (if that is indeed the
problem here). A variety of devs have tested it, but there may be a
lingering issue?

No helpful errors printed above BUILD FAILED? The line the errors you
pasted gives is simply the line that fails the build if tests failed.

There is still a way to run them sequentially (as Hudson should be
doing)
that Robert should be able to let you in on as well. But it would be
nice to
get to the bottom of this.

- Mark

On 03/23/2010 03:36 PM, Michael Busch wrote:

   

Hi all,

I wanted to commit LUCENE-2329.  I just checked out the new combined
trunk https://svn.apache.org/repos/asf/lucene/dev/trunk and ran "ant
test".
  After 20 mins the build failed on the unmodified code (see below).  I
hadn't applied my patch yet.

What's the status of the combined trunk?  Should the tests pass?  As
far
as I can tell all lucene tests were successful (core, contrib, bw), but
the
Solr tests failed.  Is there more setup for the Solr part necessary
after
'svn checkout'?

  Michael

BUILD FAILED

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/build.xml:28:
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:393:
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:


/Users/m

[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848908#action_12848908
 ] 

Shai Erera commented on LUCENE-2215:


I must admit I don't like throwing UOE. I imagine the naive user calling one of 
these and hit w/ UOE out of nowhere really :). Perhaps it's a sign 
PagingCollector should not be a sub-class of TopDocsCollector? It does not 
benefit from it in any way because it overrides all the main methods, impls 
them or throws UOE for those it doesn't like. So perhaps it should just be a 
TopScorePagingCollector which copies some of the functionality of TSDC, but is 
not a TDC itself. It will have a topDocs() method, and only it (b/c I agree the 
rest don't make any sense).

Notice the different name I propose - to make it clear it's a collector that 
can be used for paging through a scored list of results.

I BTW liked that the if/else clauses were separated, b/c you could include 
meaningful documentation for each. Right now those are just very long lines.

About in-order, I think the only thing you will save is the last 'else'. Read 
my comment above about wrapping TSDC ... not sure about it, but it will make it 
more elegant.

I'll review the rest of the patch. Didn't yet understand what's PagingIterable 
for ...

> paging collector
> 
>
> Key: LUCENE-2215
> URL: https://issues.apache.org/jira/browse/LUCENE-2215
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 2.4, 3.0
>Reporter: Adam Heinz
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: IterablePaging.java, LUCENE-2215.patch, 
> PagingCollector.java, TestingPagingCollector.java
>
>
> http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
> Somebody assign this to Aaron McCurry and we'll see if we can get enough 
> votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848904#action_12848904
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

bq. BTW, I've noticed that you don't track maxScore

Good point.  I think we probably should track it, so that the PagingColl could 
be used right from the get go.

We might also consider deprecating the topDocs() methods that take in 
parameters and think about how the paging collector might be integrated at a 
lower level in the other collectors, such that one doesn't even have to think 
about calling a diff. collector.

> paging collector
> 
>
> Key: LUCENE-2215
> URL: https://issues.apache.org/jira/browse/LUCENE-2215
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 2.4, 3.0
>Reporter: Adam Heinz
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: IterablePaging.java, LUCENE-2215.patch, 
> PagingCollector.java, TestingPagingCollector.java
>
>
> http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
> Somebody assign this to Aaron McCurry and we'll see if we can get enough 
> votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2215) paging collector

2010-03-23 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2215:


Attachment: LUCENE-2215.patch

Here's an update of Aaron's work with the following changes:

1. Added real unit tests
2. Made topDocs() non final in order to override in PagingCollector to handle 
the case where the some edge cases with larger PQ size than total hits.  
Overrode the other topDocs(...) methods to throw UnsupportedOperation as they 
aren't needed for a Paging Collector
3. Pass in num already seen so that PQ operations can be calculated correctly.  
Not sure if we really need, but otherwise it puts the burden on the user to 
make sure the PQ is sized properly, I think, which may not be such a bad burden
4. Renamed IterablePaging to be PagingIterable.  Not a huge fan of that name 
either, but couldn't think of anything better
5. Collapsed the if/else clauses in the collect method into a single if clause.

Left to do:
1. benchmark.  Is it really better?
2. Not entirely certain on the PagingIterable API stuff yet.  Looks useful.
3. Should we have an InOrder Collector as well?  Seems like we might be able to 
save a few operations per doc.

> paging collector
> 
>
> Key: LUCENE-2215
> URL: https://issues.apache.org/jira/browse/LUCENE-2215
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 2.4, 3.0
>Reporter: Adam Heinz
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: IterablePaging.java, LUCENE-2215.patch, 
> PagingCollector.java, TestingPagingCollector.java
>
>
> http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
> Somebody assign this to Aaron McCurry and we'll see if we can get enough 
> votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848896#action_12848896
 ] 

Shai Erera commented on LUCENE-2215:


I've reviewed PagingCollector.java and the first thing I have to say about it 
is that I really like it ! :) Saves lots of unnecessary heapify code, if the 
application can allow itself to store the lowest last SD.

I have few comments/questions.

I don't understand what getLastScoreDoc is for? Is it just a utility method? Is 
it something the app can compute by itself? Anyway, it lacks javadocs, so 
perhaps if they existed I wouldn't need to ask ;).

In collect(), there's the following code:
{code}
} else if (score == previousPassLowest.score && doc <= 
previousPassLowest.doc) {
// if the scores are the same and the doc is less than 
or equal to
// the
// previous pass lowest hit doc then skip because this 
collector
// favors
// lower number documents.
return;
{code}

I think there's a typo in the comment "favors lower number documents" .. while 
it seems to prefer higher doc IDs? The way I understand it, irregardless of 
whether docs are collected in/out of order, HitQueue ensures that when scores 
are equals, the lowest IDs are favored. Thus the first round always keeps the 
lowest IDs among the docs whose scores match. The next round will favor the 
docs whose IDs come next, and so forth ... am I right? (just clarifying my 
understanding).
If that's the case, I think it'll be good if it's spelled out in the comment, 
and also mention that it means that document has already been returned 
previously (like it's documented in the previous 'if').

The last 'else' really looks like TSDC's out-of-order version, which makes me 
think whether PagingCollector can be viewed as a filter on top of TSDC (and 
possibly even TopFieldCollector)? So if a hit should be collected, it just 
calls super.collect? I realize though that a Collector is a hotspot and we want 
to minimize 'if' let alone method call statements as much as possible. But it 
just feels so strong that it should be a filter ... :). And you wouldn't need 
to specifically handle in/out orderness ... and w/ the right design, it can 
also wrap a TFC or any other TDC implementation ...

BTW, I've noticed that you don't track maxScore - is it assumed that the 
application stores it from the first round? If so I'd document it, because the 
application needs to know it should use TSDC the first round, and 
PagingCollector the second round.

Also, PagingCollector offers a ctor which does not force the application to 
pass in a ScoreDoc. See my comment from above - it might be misleading, because 
if you use this collector right from the very first search, you lose the 
maxScore tracking. I also don't see why it should be allowed - if a dummy 
previousPassLowest ScoreDoc is used, collect() does a lot of unnecessary 'if's. 
I think this collector should be used only from the second round, and a single 
ctor which forces a ScoreDoc to be passed would make more sense. If the 
application wishes to shoot itself in the leg (performance-wise), it can pass a 
dummy SD itself.

> paging collector
> 
>
> Key: LUCENE-2215
> URL: https://issues.apache.org/jira/browse/LUCENE-2215
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 2.4, 3.0
>Reporter: Adam Heinz
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: IterablePaging.java, PagingCollector.java, 
> TestingPagingCollector.java
>
>
> http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
> Somebody assign this to Aaron McCurry and we'll see if we can get enough 
> votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: #lucene IRC log [was: RE: lucene and solr trunk]

2010-03-23 Thread Otis Gospodnetic
Uh, the IRC logs... Do people really think making those *searchable* would be 
useful?

I think they'd be *extremely* noisy and hard to interpret without a person 
really just sequentially reading them.  Lots of people talking at the same 
time, multiple topics, lots of very short intertwined messages that always need 
a lot of context, aren't threadable, etc.


Archiving the logs feels like it would be useful, but realistically speaking, 
they would be pretty big and who has the time to read them after the fact?  You 
guys all read the recent issue of The Economist, right? ;)

Otis 

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Ian Holsman 
>To: java-dev@lucene.apache.org
>Sent: Thu, March 18, 2010 1:47:56 AM
>Subject: Re: #lucene IRC log [was: RE: lucene and solr trunk]
>
> >
>
>
>  
>
>+1
>
>>I'd like to see the IRC logs added to things like
> 
>http://search-lucene.com/ and
> 
>http://www.lucidimagination.com/search/?q=IRC&Search=Search
>
>>while it might not be great for decision making.. it is amazing for
>helping debug common problems people have
>
>>On 3/17/10 7:10 AM, Chris Hostetter wrote:
>
>: with, "if id didn't happen on the lists, it didn't happen". Its the same as
>>
>>+1
>>
>>But as the IRC channel gets used more and more, it would *also* be nice if 
>>there was an archive of the IRC channel so that there is a place to go 
>>look to understand the back story behind an idea once it's synthesized and 
>>posted to the lists/jira.
>>
>>That's the huge advantage IRC has over informal conversations at 
>>hackathons, apachecon, and meetups -- there can in fact be easily 
>>archivable/parsable/searchable records of the communication.
>>
>>
>>
>>-Hoss
>>
>>
>>-
>>To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>>For additional commands, e-mail: java-dev-h...@lucene.apache.org 
>>
>>  
>

Re: Running the Solr/Lucene tests failed

2010-03-23 Thread Robert Muir
Yeah its a bit confusing... before, exceptions happening in other
threads were silently hidden.

Uwe fixed this in Lucene I think, and right now the verbosity is
cranked for Solr, too.
Yonik is hacking away at these tests to quiet the ones that are "truly
expected" exceptions...

At least I think I got this right...

On Tue, Mar 23, 2010 at 4:26 PM, Michael Busch  wrote:
> I see.  And all the other exceptions printed are expected?
>
>  Michael
>
> On 3/23/10 1:20 PM, Robert Muir wrote:
>>
>> Thanks Michael, this isn't a parallel test problem at all, its a
>> sporatic problem with solr's jetty tests (the same problem I mentioned
>> in the previous response).
>>
>> You might/will see this problem running the tests sequentially too.
>>
>> Test org.apache.solr.client.solrj.embedded.JettyWebappTest FAILED
>>
>> On Tue, Mar 23, 2010 at 4:15 PM, Michael Busch  wrote:
>>
>>>
>>> Sorry for the lack of details.  Thought I had just not done an obvious
>>> step.
>>> Attached is the output from the Solr part.
>>>
>>> Btw: This machine is a Solr virgin,  Solr never ran on it before.
>>>
>>>  Michael
>>>
>>> On 3/23/10 1:00 PM, Mark Miller wrote:
>>>

 Robert very recently committed some stuff that parallelizes the solr
 tests
 that may need to be worked out in all cases still (if that is indeed the
 problem here). A variety of devs have tested it, but there may be a
 lingering issue?

 No helpful errors printed above BUILD FAILED? The line the errors you
 pasted gives is simply the line that fails the build if tests failed.

 There is still a way to run them sequentially (as Hudson should be
 doing)
 that Robert should be able to let you in on as well. But it would be
 nice to
 get to the bottom of this.

 - Mark

 On 03/23/2010 03:36 PM, Michael Busch wrote:

>
> Hi all,
>
> I wanted to commit LUCENE-2329.  I just checked out the new combined
> trunk https://svn.apache.org/repos/asf/lucene/dev/trunk and ran "ant
> test".
>  After 20 mins the build failed on the unmodified code (see below).  I
> hadn't applied my patch yet.
>
> What's the status of the combined trunk?  Should the tests pass?  As
> far
> as I can tell all lucene tests were successful (core, contrib, bw), but
> the
> Solr tests failed.  Is there more setup for the Solr part necessary
> after
> 'svn checkout'?
>
>  Michael
>
> BUILD FAILED
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/build.xml:28:
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:393:
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
>
>
> /Users/michael/Documents/workspace/l

Re: Running the Solr/Lucene tests failed

2010-03-23 Thread Michael Busch

I see.  And all the other exceptions printed are expected?

 Michael

On 3/23/10 1:20 PM, Robert Muir wrote:

Thanks Michael, this isn't a parallel test problem at all, its a
sporatic problem with solr's jetty tests (the same problem I mentioned
in the previous response).

You might/will see this problem running the tests sequentially too.

Test org.apache.solr.client.solrj.embedded.JettyWebappTest FAILED

On Tue, Mar 23, 2010 at 4:15 PM, Michael Busch  wrote:
   

Sorry for the lack of details.  Thought I had just not done an obvious step.
Attached is the output from the Solr part.

Btw: This machine is a Solr virgin,  Solr never ran on it before.

  Michael

On 3/23/10 1:00 PM, Mark Miller wrote:
 

Robert very recently committed some stuff that parallelizes the solr tests
that may need to be worked out in all cases still (if that is indeed the
problem here). A variety of devs have tested it, but there may be a
lingering issue?

No helpful errors printed above BUILD FAILED? The line the errors you
pasted gives is simply the line that fails the build if tests failed.

There is still a way to run them sequentially (as Hudson should be doing)
that Robert should be able to let you in on as well. But it would be nice to
get to the bottom of this.

- Mark

On 03/23/2010 03:36 PM, Michael Busch wrote:
   

Hi all,

I wanted to commit LUCENE-2329.  I just checked out the new combined
trunk https://svn.apache.org/repos/asf/lucene/dev/trunk and ran "ant test".
  After 20 mins the build failed on the unmodified code (see below).  I
hadn't applied my patch yet.

What's the status of the combined trunk?  Should the tests pass?  As far
as I can tell all lucene tests were successful (core, contrib, bw), but the
Solr tests failed.  Is there more setup for the Solr part necessary after
'svn checkout'?

  Michael

BUILD FAILED
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/build.xml:28:
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:393:
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!
The following error occurred while executing this line:

/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
Tests failed!

Total time: 19 minutes 38 seconds


---

Re: Running the Solr/Lucene tests failed

2010-03-23 Thread Michael Busch

On 3/23/10 1:07 PM, Robert Muir wrote:

Maybe, the Solr test TestLBHttpSolrServer failed for me randomly
before this parallelization though, and still does.
In general the jetty tests have caused me some grief.

But its also equally likely i broke it for you somehow...

Michael, can you try running with -Dsequential-tests=1 ?

   


Sure - it's running now, will send the results when it's done.


Apologies if its something I caused.

   


Hey don't apologize!  You've done tons of work!

 Michael

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Running the Solr/Lucene tests failed

2010-03-23 Thread Robert Muir
Thanks Michael, this isn't a parallel test problem at all, its a
sporatic problem with solr's jetty tests (the same problem I mentioned
in the previous response).

You might/will see this problem running the tests sequentially too.

Test org.apache.solr.client.solrj.embedded.JettyWebappTest FAILED

On Tue, Mar 23, 2010 at 4:15 PM, Michael Busch  wrote:
> Sorry for the lack of details.  Thought I had just not done an obvious step.
> Attached is the output from the Solr part.
>
> Btw: This machine is a Solr virgin,  Solr never ran on it before.
>
>  Michael
>
> On 3/23/10 1:00 PM, Mark Miller wrote:
>>
>> Robert very recently committed some stuff that parallelizes the solr tests
>> that may need to be worked out in all cases still (if that is indeed the
>> problem here). A variety of devs have tested it, but there may be a
>> lingering issue?
>>
>> No helpful errors printed above BUILD FAILED? The line the errors you
>> pasted gives is simply the line that fails the build if tests failed.
>>
>> There is still a way to run them sequentially (as Hudson should be doing)
>> that Robert should be able to let you in on as well. But it would be nice to
>> get to the bottom of this.
>>
>> - Mark
>>
>> On 03/23/2010 03:36 PM, Michael Busch wrote:
>>>
>>> Hi all,
>>>
>>> I wanted to commit LUCENE-2329.  I just checked out the new combined
>>> trunk https://svn.apache.org/repos/asf/lucene/dev/trunk and ran "ant test".
>>>  After 20 mins the build failed on the unmodified code (see below).  I
>>> hadn't applied my patch yet.
>>>
>>> What's the status of the combined trunk?  Should the tests pass?  As far
>>> as I can tell all lucene tests were successful (core, contrib, bw), but the
>>> Solr tests failed.  Is there more setup for the Solr part necessary after
>>> 'svn checkout'?
>>>
>>>  Michael
>>>
>>> BUILD FAILED
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/build.xml:28:
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:393:
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>>> Tests failed!
>>> The following error occurred while executing this line:
>>>
>>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/buil

Re: Running the Solr/Lucene tests failed

2010-03-23 Thread Robert Muir
Maybe, the Solr test TestLBHttpSolrServer failed for me randomly
before this parallelization though, and still does.
In general the jetty tests have caused me some grief.

But its also equally likely i broke it for you somehow...

Michael, can you try running with -Dsequential-tests=1 ?

Apologies if its something I caused.

On Tue, Mar 23, 2010 at 4:00 PM, Mark Miller  wrote:
> Robert very recently committed some stuff that parallelizes the solr tests
> that may need to be worked out in all cases still (if that is indeed the
> problem here). A variety of devs have tested it, but there may be a
> lingering issue?
>
> No helpful errors printed above BUILD FAILED? The line the errors you pasted
> gives is simply the line that fails the build if tests failed.
>
> There is still a way to run them sequentially (as Hudson should be doing)
> that Robert should be able to let you in on as well. But it would be nice to
> get to the bottom of this.
>
> - Mark
>
> On 03/23/2010 03:36 PM, Michael Busch wrote:
>>
>> Hi all,
>>
>> I wanted to commit LUCENE-2329.  I just checked out the new combined trunk
>> https://svn.apache.org/repos/asf/lucene/dev/trunk and ran "ant test".  After
>> 20 mins the build failed on the unmodified code (see below).  I hadn't
>> applied my patch yet.
>>
>> What's the status of the combined trunk?  Should the tests pass?  As far
>> as I can tell all lucene tests were successful (core, contrib, bw), but the
>> Solr tests failed.  Is there more setup for the Solr part necessary after
>> 'svn checkout'?
>>
>>  Michael
>>
>> BUILD FAILED
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/build.xml:28:
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:393:
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>> The following error occurred while executing this line:
>>
>> /Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472:
>> Tests failed!
>>
>> Total time: 19 minutes 38 seconds
>>
>>
>> -
>> To unsubscribe, e-mail:

Re: Running the Solr/Lucene tests failed

2010-03-23 Thread Mark Miller
Robert very recently committed some stuff that parallelizes the solr 
tests that may need to be worked out in all cases still (if that is 
indeed the problem here). A variety of devs have tested it, but there 
may be a lingering issue?


No helpful errors printed above BUILD FAILED? The line the errors you 
pasted gives is simply the line that fails the build if tests failed.


There is still a way to run them sequentially (as Hudson should be 
doing) that Robert should be able to let you in on as well. But it would 
be nice to get to the bottom of this.


- Mark

On 03/23/2010 03:36 PM, Michael Busch wrote:

Hi all,

I wanted to commit LUCENE-2329.  I just checked out the new combined 
trunk https://svn.apache.org/repos/asf/lucene/dev/trunk and ran "ant 
test".  After 20 mins the build failed on the unmodified code (see 
below).  I hadn't applied my patch yet.


What's the status of the combined trunk?  Should the tests pass?  As 
far as I can tell all lucene tests were successful (core, contrib, 
bw), but the Solr tests failed.  Is there more setup for the Solr part 
necessary after 'svn checkout'?


 Michael

BUILD FAILED
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/build.xml:28: 
The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:393: 


The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!


Total time: 19 minutes 38 seconds


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Running the Solr/Lucene tests failed

2010-03-23 Thread Uwe Schindler
The last hudson run worked two hours ago...

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael Busch [mailto:busch...@gmail.com]
> Sent: Tuesday, March 23, 2010 8:37 PM
> To: java-dev@lucene.apache.org; solr-...@lucene.apache.org
> Subject: Running the Solr/Lucene tests failed
> 
> Hi all,
> 
> I wanted to commit LUCENE-2329.  I just checked out the new combined
> trunk https://svn.apache.org/repos/asf/lucene/dev/trunk and ran "ant
> test".  After 20 mins the build failed on the unmodified code (see
> below).  I hadn't applied my patch yet.
> 
> What's the status of the combined trunk?  Should the tests pass?  As
> far
> as I can tell all lucene tests were successful (core, contrib, bw), but
> the Solr tests failed.  Is there more setup for the Solr part necessary
> after 'svn checkout'?
> 
>   Michael
> 
> BUILD FAILED
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/build.xml:28:
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:393:
> 
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> The following error occurred while executing this line:
> /Users/michael/Documents/workspace/lucene-solr-
> trunk/trunk/solr/build.xml:472:
> Tests failed!
> 
> Total time: 19 minutes 38 seconds
> 
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Running the Solr/Lucene tests failed

2010-03-23 Thread Michael Busch

Hi all,

I wanted to commit LUCENE-2329.  I just checked out the new combined 
trunk https://svn.apache.org/repos/asf/lucene/dev/trunk and ran "ant 
test".  After 20 mins the build failed on the unmodified code (see 
below).  I hadn't applied my patch yet.


What's the status of the combined trunk?  Should the tests pass?  As far 
as I can tell all lucene tests were successful (core, contrib, bw), but 
the Solr tests failed.  Is there more setup for the Solr part necessary 
after 'svn checkout'?


 Michael

BUILD FAILED
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/build.xml:28: 
The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:393: 


The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!

The following error occurred while executing this line:
/Users/michael/Documents/workspace/lucene-solr-trunk/trunk/solr/build.xml:472: 
Tests failed!


Total time: 19 minutes 38 seconds


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-23 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848855#action_12848855
 ] 

Michael Busch commented on LUCENE-2329:
---

Cool, will do!  Thanks for the review and good questions... and the whole idea! 
:)

> Use parallel arrays instead of PostingList objects
> --
>
> Key: LUCENE-2329
> URL: https://issues.apache.org/jira/browse/LUCENE-2329
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2329.patch, lucene-2329.patch, lucene-2329.patch
>
>
> This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324.
> In order to avoid having very many long-living PostingList objects in 
> TermsHashPerField we want to switch to parallel arrays.  The termsHash will 
> simply be a int[] which maps each term to dense termIDs.
> All data that the PostingList classes currently hold will then we placed in 
> parallel arrays, where the termID is the index into the arrays.  This will 
> avoid the need for object pooling, will remove the overhead of object 
> initialization and garbage collection.  Especially garbage collection should 
> benefit significantly when the JVM runs out of memory, because in such a 
> situation the gc mark times can get very long if there is a big number of 
> long-living objects in memory.
> Another benefit could be to build more efficient TermVectors.  We could avoid 
> the need of having to store the term string per document in the TermVector.  
> Instead we could just store the segment-wide termIDs.  This would reduce the 
> size and also make it easier to implement efficient algorithms that use 
> TermVectors, because no term mapping across documents in a segment would be 
> necessary.  Though this improvement we can make with a separate jira issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2323) reorganize contrib modules

2010-03-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848853#action_12848853
 ] 

Hoss Man commented on LUCENE-2323:
--

bq. If no one objects, (especially including Hoss Man)

I really have no opinions, I was just trying to chime in with my memories of 
hte past discussions -- i don't necessarily think one way or another is more 
good/bad right/wrong.

go with your gut.

> reorganize contrib modules
> --
>
> Key: LUCENE-2323
> URL: https://issues.apache.org/jira/browse/LUCENE-2323
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Robert Muir
>Assignee: Robert Muir
> Attachments: LUCENE-2323.patch
>
>
> it would be nice to reorganize contrib modules, so that they are bundled 
> together by functionality.
> For example:
> * the wikipedia contrib is a tokenizer, i think really belongs in 
> contrib/analyzers
> * there are two highlighters, i think could be one highlighters package.
> * there are many queryparsers and queries in different places in contrib

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848842#action_12848842
 ] 

Grant Ingersoll commented on LUCENE-2215:
-

I think in order to properly implement this, topDocs() needs to be non-final, 
otherwise there is some oddities in initing a PQ with more results than are 
available once paging.  Updated patch shortly.

> paging collector
> 
>
> Key: LUCENE-2215
> URL: https://issues.apache.org/jira/browse/LUCENE-2215
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 2.4, 3.0
>Reporter: Adam Heinz
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: IterablePaging.java, PagingCollector.java, 
> TestingPagingCollector.java
>
>
> http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
> Somebody assign this to Aaron McCurry and we'll see if we can get enough 
> votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848833#action_12848833
 ] 

Michael McCandless commented on LUCENE-2329:


OK indeed it does sounds reasonable!  Sweet :)  I think you should commit it!  
Make sure you "svn switch" your checkout first :)  And pass Solr's tests!


> Use parallel arrays instead of PostingList objects
> --
>
> Key: LUCENE-2329
> URL: https://issues.apache.org/jira/browse/LUCENE-2329
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2329.patch, lucene-2329.patch, lucene-2329.patch
>
>
> This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324.
> In order to avoid having very many long-living PostingList objects in 
> TermsHashPerField we want to switch to parallel arrays.  The termsHash will 
> simply be a int[] which maps each term to dense termIDs.
> All data that the PostingList classes currently hold will then we placed in 
> parallel arrays, where the termID is the index into the arrays.  This will 
> avoid the need for object pooling, will remove the overhead of object 
> initialization and garbage collection.  Especially garbage collection should 
> benefit significantly when the JVM runs out of memory, because in such a 
> situation the gc mark times can get very long if there is a big number of 
> long-living objects in memory.
> Another benefit could be to build more efficient TermVectors.  We could avoid 
> the need of having to store the term string per document in the TermVector.  
> Instead we could just store the segment-wide termIDs.  This would reduce the 
> size and also make it easier to implement efficient algorithms that use 
> TermVectors, because no term mapping across documents in a segment would be 
> necessary.  Though this improvement we can make with a separate jira issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-23 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848827#action_12848827
 ] 

Michael Busch edited comment on LUCENE-2329 at 3/23/10 6:06 PM:


{quote}
They save the object header per-unique-term, and 4 bytes on 64bit JREs since 
the "pointer" is now an int and not a real pointer?
{quote}

We actually save on 64bit JVMs (which I used for my tests) 28 bytes per 
unique-term:

h4. Trunk:
{code}
// Why + 4*POINTER_NUM_BYTE below?
//   +1: Posting is referenced by postingsFreeList array
//   +3: Posting is referenced by hash, which
//   targets 25-50% fill factor; approximate this
//   as 3X # pointers
bytesPerPosting = consumer.bytesPerPosting() + 
4*DocumentsWriter.POINTER_NUM_BYTE;

...

  @Override
  int bytesPerPosting() {
return RawPostingList.BYTES_SIZE + 4 * DocumentsWriter.INT_NUM_BYTE;
  }

...
abstract class RawPostingList {
  final static int BYTES_SIZE = DocumentsWriter.OBJECT_HEADER_BYTES + 
3*DocumentsWriter.INT_NUM_BYTE;

...

  // Coarse estimates used to measure RAM usage of buffered deletes
  final static int OBJECT_HEADER_BYTES = 8;
  final static int POINTER_NUM_BYTE = Constants.JRE_IS_64BIT ? 8 : 4;
{code}

This needs 8 bytes + 3 * 4 bytes + 4 * 4 bytes + 4 * 8 bytes = 68 bytes. 

h4. 2329:
{code}
//   +3: Posting is referenced by hash, which
//   targets 25-50% fill factor; approximate this
//   as 3X # pointers
bytesPerPosting = consumer.bytesPerPosting() + 
3*DocumentsWriter.INT_NUM_BYTE;

...

  @Override
  int bytesPerPosting() {
return ParallelPostingsArray.BYTES_PER_POSTING + 4 * 
DocumentsWriter.INT_NUM_BYTE;
  }

...

final static int BYTES_PER_POSTING = 3 * DocumentsWriter.INT_NUM_BYTE;
{code}

This needs 3 * 4 bytes + 4 * 4 bytes + 3 * 4 bytes = 40 bytes.


I checked how many bytes were allocated for postings when the first segment was 
flushed.  Trunk flushed after 6400 docs and had 103MB allocated for PostingList 
objects.  2329 flushed after 8279 docs and had 94MB allocated for the parallel 
arrays, and 74MB out of the 94MB were actually used.

The first docs in the wikipedia dataset seem pretty large with many unique 
terms.

I think this sounds reasonable?

  was (Author: michaelbusch):
{quote}
They save the object header per-unique-term, and 4 bytes on 64bit JREs since 
the "pointer" is now an int and not a real pointer?
{quote}

We actually save on 64bit JVMs (which I used for my tests) 28 bytes per posting:

h4. Trunk:
{code}
// Why + 4*POINTER_NUM_BYTE below?
//   +1: Posting is referenced by postingsFreeList array
//   +3: Posting is referenced by hash, which
//   targets 25-50% fill factor; approximate this
//   as 3X # pointers
bytesPerPosting = consumer.bytesPerPosting() + 
4*DocumentsWriter.POINTER_NUM_BYTE;

...

  @Override
  int bytesPerPosting() {
return RawPostingList.BYTES_SIZE + 4 * DocumentsWriter.INT_NUM_BYTE;
  }

...
abstract class RawPostingList {
  final static int BYTES_SIZE = DocumentsWriter.OBJECT_HEADER_BYTES + 
3*DocumentsWriter.INT_NUM_BYTE;

...

  // Coarse estimates used to measure RAM usage of buffered deletes
  final static int OBJECT_HEADER_BYTES = 8;
  final static int POINTER_NUM_BYTE = Constants.JRE_IS_64BIT ? 8 : 4;
{code}

This needs 8 bytes + 3 * 4 bytes + 4 * 4 bytes + 4 * 8 bytes = 68 bytes. 

h4. 2329:
{code}
//   +3: Posting is referenced by hash, which
//   targets 25-50% fill factor; approximate this
//   as 3X # pointers
bytesPerPosting = consumer.bytesPerPosting() + 
3*DocumentsWriter.INT_NUM_BYTE;

...

  @Override
  int bytesPerPosting() {
return ParallelPostingsArray.BYTES_PER_POSTING + 4 * 
DocumentsWriter.INT_NUM_BYTE;
  }

...

final static int BYTES_PER_POSTING = 3 * DocumentsWriter.INT_NUM_BYTE;
{code}

This needs 3 * 4 bytes + 4 * 4 bytes + 3 * 4 bytes = 40 bytes.


I checked how many bytes were allocated for postings when the first segment was 
flushed.  Trunk flushed after 6400 docs and had 103MB allocated for PostingList 
objects.  2329 flushed after 8279 docs and had 94MB allocated for the parallel 
arrays, and 74MB out of the 94MB were actually used.

The first docs in the wikipedia dataset seem pretty large with many unique 
terms.

I think this sounds reasonable?
  
> Use parallel arrays instead of PostingList objects
> --
>
> Key: LUCENE-2329
> URL: https://issues.apache.org/jira/browse/LUCENE-2329
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2329.patch, lucene-2329.patch, lucene-2329.patch
>
>
> This is Mike's idea that was 

[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-23 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848827#action_12848827
 ] 

Michael Busch commented on LUCENE-2329:
---

{quote}
They save the object header per-unique-term, and 4 bytes on 64bit JREs since 
the "pointer" is now an int and not a real pointer?
{quote}

We actually save on 64bit JVMs (which I used for my tests) 28 bytes per posting:

h4. Trunk:
{code}
// Why + 4*POINTER_NUM_BYTE below?
//   +1: Posting is referenced by postingsFreeList array
//   +3: Posting is referenced by hash, which
//   targets 25-50% fill factor; approximate this
//   as 3X # pointers
bytesPerPosting = consumer.bytesPerPosting() + 
4*DocumentsWriter.POINTER_NUM_BYTE;

...

  @Override
  int bytesPerPosting() {
return RawPostingList.BYTES_SIZE + 4 * DocumentsWriter.INT_NUM_BYTE;
  }

...
abstract class RawPostingList {
  final static int BYTES_SIZE = DocumentsWriter.OBJECT_HEADER_BYTES + 
3*DocumentsWriter.INT_NUM_BYTE;

...

  // Coarse estimates used to measure RAM usage of buffered deletes
  final static int OBJECT_HEADER_BYTES = 8;
  final static int POINTER_NUM_BYTE = Constants.JRE_IS_64BIT ? 8 : 4;
{code}

This needs 8 bytes + 3 * 4 bytes + 4 * 4 bytes + 4 * 8 bytes = 68 bytes. 

h4. 2329:
{code}
//   +3: Posting is referenced by hash, which
//   targets 25-50% fill factor; approximate this
//   as 3X # pointers
bytesPerPosting = consumer.bytesPerPosting() + 
3*DocumentsWriter.INT_NUM_BYTE;

...

  @Override
  int bytesPerPosting() {
return ParallelPostingsArray.BYTES_PER_POSTING + 4 * 
DocumentsWriter.INT_NUM_BYTE;
  }

...

final static int BYTES_PER_POSTING = 3 * DocumentsWriter.INT_NUM_BYTE;
{code}

This needs 3 * 4 bytes + 4 * 4 bytes + 3 * 4 bytes = 40 bytes.


I checked how many bytes were allocated for postings when the first segment was 
flushed.  Trunk flushed after 6400 docs and had 103MB allocated for PostingList 
objects.  2329 flushed after 8279 docs and had 94MB allocated for the parallel 
arrays, and 74MB out of the 94MB were actually used.

The first docs in the wikipedia dataset seem pretty large with many unique 
terms.

I think this sounds reasonable?

> Use parallel arrays instead of PostingList objects
> --
>
> Key: LUCENE-2329
> URL: https://issues.apache.org/jira/browse/LUCENE-2329
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2329.patch, lucene-2329.patch, lucene-2329.patch
>
>
> This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324.
> In order to avoid having very many long-living PostingList objects in 
> TermsHashPerField we want to switch to parallel arrays.  The termsHash will 
> simply be a int[] which maps each term to dense termIDs.
> All data that the PostingList classes currently hold will then we placed in 
> parallel arrays, where the termID is the index into the arrays.  This will 
> avoid the need for object pooling, will remove the overhead of object 
> initialization and garbage collection.  Especially garbage collection should 
> benefit significantly when the JVM runs out of memory, because in such a 
> situation the gc mark times can get very long if there is a big number of 
> long-living objects in memory.
> Another benefit could be to build more efficient TermVectors.  We could avoid 
> the need of having to store the term string per document in the TermVector.  
> Instead we could just store the segment-wide termIDs.  This would reduce the 
> size and also make it easier to implement efficient algorithms that use 
> TermVectors, because no term mapping across documents in a segment would be 
> necessary.  Though this improvement we can make with a separate jira issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Implementing new collectors

2010-03-23 Thread Michael McCandless
OK put it up!  Sounds good :)

Mike

On Tue, Mar 23, 2010 at 1:54 PM, Grant Ingersoll  wrote:
>
> On Mar 23, 2010, at 1:20 PM, Michael McCandless wrote:
>
>> You can implement just the "out of order" collector, since it subsumes
>> the in-order case, and all will work fine.
>>
>> However, if the collector can save CPU when docs are known to arrive
>> in-order (not all collectors can) it'd be good to make a separate
>> in-order one as well.
>
> Since the thing I'm working on is a Paging Collector and it extends 
> TopDocsCollector, and the logic looks more or less like that of the 
> OutOfOrderTopScoreDocCollector, I think we could likewise save a few if 
> checks with an in order one.
>
> How about I put a patch up and then you can take a look?  My gut says it 
> should be possible.
>
> -Grant
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Implementing new collectors

2010-03-23 Thread Grant Ingersoll

On Mar 23, 2010, at 1:20 PM, Michael McCandless wrote:

> You can implement just the "out of order" collector, since it subsumes
> the in-order case, and all will work fine.
> 
> However, if the collector can save CPU when docs are known to arrive
> in-order (not all collectors can) it'd be good to make a separate
> in-order one as well.

Since the thing I'm working on is a Paging Collector and it extends 
TopDocsCollector, and the logic looks more or less like that of the 
OutOfOrderTopScoreDocCollector, I think we could likewise save a few if checks 
with an in order one.

How about I put a patch up and then you can take a look?  My gut says it should 
be possible.

-Grant
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Implementing new collectors

2010-03-23 Thread Michael McCandless
You can implement just the "out of order" collector, since it subsumes
the in-order case, and all will work fine.

However, if the collector can save CPU when docs are known to arrive
in-order (not all collectors can) it'd be good to make a separate
in-order one as well.

Mike

On Tue, Mar 23, 2010 at 12:53 PM, Grant Ingersoll  wrote:
> I'm still slightly confused on "in order" and "out of order" collectors.  I 
> mean, I get what they do, but, if I'm implementing a new collector (see 
> https://issues.apache.org/jira/browse/LUCENE-2215) that is going to be part 
> of core, should I implement two versions:  one for in order and one for out 
> of order?
>
> -Grant
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2127) Improved large result handling

2010-03-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848795#action_12848795
 ] 

Grant Ingersoll commented on LUCENE-2127:
-

Hey Jason,

My tests are inconclusive on the patch posted above.  LUCENE-2215, however, 
seems promising and good to get into Solr as well.

> Improved large result handling
> --
>
> Key: LUCENE-2127
> URL: https://issues.apache.org/jira/browse/LUCENE-2127
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2127.patch, LUCENE-2127.patch
>
>
> Per 
> http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5,
>  it would be nice to offer some other Collectors that are better at handling 
> really large number of results.  This could be implemented in a variety of 
> ways via Collectors.  For instance, we could have a raw collector that does 
> no sorting and just returns the ScoreDocs, or we could do as Mike suggests 
> and have Collectors that have heuristics about memory tradeoffs and only 
> heapify when appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Implementing new collectors

2010-03-23 Thread Grant Ingersoll
I'm still slightly confused on "in order" and "out of order" collectors.  I 
mean, I get what they do, but, if I'm implementing a new collector (see 
https://issues.apache.org/jira/browse/LUCENE-2215) that is going to be part of 
core, should I implement two versions:  one for in order and one for out of 
order?  

-Grant
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2328.


Resolution: Fixed

> IndexWriter.synced  field accumulates data leading to a Memory Leak
> ---
>
> Key: LUCENE-2328
> URL: https://issues.apache.org/jira/browse/LUCENE-2328
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
> Environment: all
>Reporter: Gregor Kaczor
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
> LUCENE-2328.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I am running into a strange OutOfMemoryError. My small test application does
> index and delete some few files. This is repeated for 60k times. Optimization
> is run from every 2k times a file is indexed. Index size is 50KB. I did 
> analyze
> the HeapDumpFile and realized that IndexWriter.synced field occupied more than
> half of the heap. That field is a private HashSet without a getter. Its task 
> is
> to hold files which have been synced already.
> There are two calls to addAll and one call to add on synced but no remove or
> clear throughout the lifecycle of the IndexWriter instance.
> According to the Eclipse Memory Analyzer synced contains 32618 entries which
> look like file names "_e065_1.del" or "_e067.cfs"
> The index directory contains 10 files only.
> I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848789#action_12848789
 ] 

Michael McCandless commented on LUCENE-2328:


OK I will commit shortly!  Thanks Earwin :)

> IndexWriter.synced  field accumulates data leading to a Memory Leak
> ---
>
> Key: LUCENE-2328
> URL: https://issues.apache.org/jira/browse/LUCENE-2328
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
> Environment: all
>Reporter: Gregor Kaczor
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
> LUCENE-2328.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I am running into a strange OutOfMemoryError. My small test application does
> index and delete some few files. This is repeated for 60k times. Optimization
> is run from every 2k times a file is indexed. Index size is 50KB. I did 
> analyze
> the HeapDumpFile and realized that IndexWriter.synced field occupied more than
> half of the heap. That field is a private HashSet without a getter. Its task 
> is
> to hold files which have been synced already.
> There are two calls to addAll and one call to add on synced but no remove or
> clear throughout the lifecycle of the IndexWriter instance.
> According to the Eclipse Memory Analyzer synced contains 32618 entries which
> look like file names "_e065_1.del" or "_e067.cfs"
> The index directory contains 10 files only.
> I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848785#action_12848785
 ] 

Earwin Burrfoot commented on LUCENE-2339:
-

I'll get back to the issue in N hours and code something neat. : )

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-23 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848782#action_12848782
 ] 

Michael Busch commented on LUCENE-2329:
---

{quote}
OK, but, RAM used by TermVectors* shouldn't participate in the accounting... ie 
it only holds RAM for the one doc, at a time.
{quote}

Man, my brain is lacking the TermVector synapses...

> Use parallel arrays instead of PostingList objects
> --
>
> Key: LUCENE-2329
> URL: https://issues.apache.org/jira/browse/LUCENE-2329
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2329.patch, lucene-2329.patch, lucene-2329.patch
>
>
> This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324.
> In order to avoid having very many long-living PostingList objects in 
> TermsHashPerField we want to switch to parallel arrays.  The termsHash will 
> simply be a int[] which maps each term to dense termIDs.
> All data that the PostingList classes currently hold will then we placed in 
> parallel arrays, where the termID is the index into the arrays.  This will 
> avoid the need for object pooling, will remove the overhead of object 
> initialization and garbage collection.  Especially garbage collection should 
> benefit significantly when the JVM runs out of memory, because in such a 
> situation the gc mark times can get very long if there is a big number of 
> long-living objects in memory.
> Another benefit could be to build more efficient TermVectors.  We could avoid 
> the need of having to store the term string per document in the TermVector.  
> Instead we could just store the segment-wide termIDs.  This would reduce the 
> size and also make it easier to implement efficient algorithms that use 
> TermVectors, because no term mapping across documents in a segment would be 
> necessary.  Though this improvement we can make with a separate jira issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848778#action_12848778
 ] 

Uwe Schindler commented on LUCENE-2328:
---

I am fine now! Go for it! Policeman is happy.

> IndexWriter.synced  field accumulates data leading to a Memory Leak
> ---
>
> Key: LUCENE-2328
> URL: https://issues.apache.org/jira/browse/LUCENE-2328
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
> Environment: all
>Reporter: Gregor Kaczor
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
> LUCENE-2328.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I am running into a strange OutOfMemoryError. My small test application does
> index and delete some few files. This is repeated for 60k times. Optimization
> is run from every 2k times a file is indexed. Index size is 50KB. I did 
> analyze
> the HeapDumpFile and realized that IndexWriter.synced field occupied more than
> half of the heap. That field is a private HashSet without a getter. Its task 
> is
> to hold files which have been synced already.
> There are two calls to addAll and one call to add on synced but no remove or
> clear throughout the lifecycle of the IndexWriter instance.
> According to the Eclipse Memory Analyzer synced contains 32618 entries which
> look like file names "_e065_1.del" or "_e067.cfs"
> The index directory contains 10 files only.
> I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848777#action_12848777
 ] 

Shai Erera commented on LUCENE-2339:


Ok that's indeed different :). I guess we can introduce it now, in this issue 
(it's tiny and simple). A closeAll which documents it throws the first 
exception it hits.

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848769#action_12848769
 ] 

Michael McCandless commented on LUCENE-2339:


bq. My assumption is that when you call closeNoException you already know that 
you've hit an exception and just want to close the stream w/o getting more 
exceptions. If you don't know that, don't call closeNoException?

Right, for this issue, let's do that.

At some point in the future I'd like a "closeAllAndThrowFirstExceptionYouHit" :)

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848767#action_12848767
 ] 

Michael McCandless commented on LUCENE-2329:


bq. But, keep in mind that TermVectors were enabled too.

OK, but, RAM used by TermVectors* shouldn't participate in the accounting... ie 
it only holds RAM for the one doc, at a time.

bq. And the number of "unique terms" in the 2nd TermsHash is higher, i.e. if 
you summed up numPostings from the 2nd TermsHash in each round that sum should 
be higher than numPostings from the first TermsHash.

1st TermsHash = current trunk and 2nd TermsHash = this patch?  Ie, it has more 
unique terms at flush time (because it's more RAM efficient)?  If so, then yes, 
I agree :)  But 22% fewer still seems too good to be true...

> Use parallel arrays instead of PostingList objects
> --
>
> Key: LUCENE-2329
> URL: https://issues.apache.org/jira/browse/LUCENE-2329
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2329.patch, lucene-2329.patch, lucene-2329.patch
>
>
> This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324.
> In order to avoid having very many long-living PostingList objects in 
> TermsHashPerField we want to switch to parallel arrays.  The termsHash will 
> simply be a int[] which maps each term to dense termIDs.
> All data that the PostingList classes currently hold will then we placed in 
> parallel arrays, where the termID is the index into the arrays.  This will 
> avoid the need for object pooling, will remove the overhead of object 
> initialization and garbage collection.  Especially garbage collection should 
> benefit significantly when the JVM runs out of memory, because in such a 
> situation the gc mark times can get very long if there is a big number of 
> long-living objects in memory.
> Another benefit could be to build more efficient TermVectors.  We could avoid 
> the need of having to store the term string per document in the TermVector.  
> Instead we could just store the segment-wide termIDs.  This would reduce the 
> size and also make it easier to implement efficient algorithms that use 
> TermVectors, because no term mapping across documents in a segment would be 
> necessary.  Though this improvement we can make with a separate jira issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Merge Status

2010-03-23 Thread Uwe Schindler
I changed the hudson nightly job of Lucene trunk to simply checkout the correct 
folders (Lucene-only). This builds lucene separate from solr, so all artifacts 
are build with a JDK 1.5. This enables solr to maybe move to 1.6.

I changed the solr build to checkout the new trunk and use a similar shell 
script to build solr like lucene does. It will automatically build the 
lucene-jars as of the new build.xml. Also solr now uses Clover 2.6.3. Hopefully 
the next build (currently pending and waiting for lucene test build) runs fine.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant
> Ingersoll
> Sent: Tuesday, March 23, 2010 3:49 PM
> Cc: solr-...@lucene.apache.org; java-dev@lucene.apache.org
> Subject: Re: Merge Status
> 
> 
> On Mar 23, 2010, at 10:09 AM, Grant Ingersoll wrote:
> 
> >
> > 3. Other nightly build stuff.  My cron tabs, etc.  I will update them
> to point at the new trunk.
> 
> OK, I updated my cron tab for the site check out of Lucene.  Not sure
> who handles Solr.
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848753#action_12848753
 ] 

Shai Erera commented on LUCENE-2339:


bq. But there is still a need to "close everything, but do throw the 1st 
exception you hit".

Ohh I see what you mean. My assumption is that when you call closeNoException 
you already know that you've hit an exception and just want to close the stream 
w/o getting more exceptions. If you don't know that, don't call 
closeNoException?

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-23 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848748#action_12848748
 ] 

Michael Busch commented on LUCENE-2329:
---

{quote}
so it's surprising the savings was so much that you get 22% fewer segments... 
are you sure there isn't a bug in the RAM usage accounting?
{quote}

Yeah it seems a bit suspicious.  I'll investigate.  But, keep in mind that 
TermVectors were enabled too.  And the number of "unique terms" in the 2nd 
TermsHash is higher, i.e. if you summed up numPostings from the 2nd TermsHash 
in each round that sum should be higher than numPostings from the first 
TermsHash. 

> Use parallel arrays instead of PostingList objects
> --
>
> Key: LUCENE-2329
> URL: https://issues.apache.org/jira/browse/LUCENE-2329
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2329.patch, lucene-2329.patch, lucene-2329.patch
>
>
> This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324.
> In order to avoid having very many long-living PostingList objects in 
> TermsHashPerField we want to switch to parallel arrays.  The termsHash will 
> simply be a int[] which maps each term to dense termIDs.
> All data that the PostingList classes currently hold will then we placed in 
> parallel arrays, where the termID is the index into the arrays.  This will 
> avoid the need for object pooling, will remove the overhead of object 
> initialization and garbage collection.  Especially garbage collection should 
> benefit significantly when the JVM runs out of memory, because in such a 
> situation the gc mark times can get very long if there is a big number of 
> long-living objects in memory.
> Another benefit could be to build more efficient TermVectors.  We could avoid 
> the need of having to store the term string per document in the TermVector.  
> Instead we could just store the segment-wide termIDs.  This would reduce the 
> size and also make it easier to implement efficient algorithms that use 
> TermVectors, because no term mapping across documents in a segment would be 
> necessary.  Though this improvement we can make with a separate jira issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848744#action_12848744
 ] 

Michael McCandless commented on LUCENE-2339:


bq. So how about we call it closeNoException, document that it does not throw 
any exception and intentionally suppresses them, and if you don't want them to 
be suppressed, you can call io.close() yourself?

But there is still a need to "close everything, but do throw the 1st exception 
you hit".  We do this in a number of places in Lucene, ad-hoc today.

However, that need is different from what we're doing here, so I agree, let's 
postpone it and have this issue only create the "closeNoException" method.

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Merge Status

2010-03-23 Thread Yonik Seeley
On Tue, Mar 23, 2010 at 10:49 AM, Grant Ingersoll  wrote:
>
> On Mar 23, 2010, at 10:09 AM, Grant Ingersoll wrote:
>
>>
>> 3. Other nightly build stuff.  My cron tabs, etc.  I will update them to 
>> point at the new trunk.
>
> OK, I updated my cron tab for the site check out of Lucene.  Not sure who 
> handles Solr.

Solr's has always been manual.

-Yonik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Merge Status

2010-03-23 Thread Grant Ingersoll

On Mar 23, 2010, at 10:09 AM, Grant Ingersoll wrote:

> 
> 3. Other nightly build stuff.  My cron tabs, etc.  I will update them to 
> point at the new trunk.

OK, I updated my cron tab for the site check out of Lucene.  Not sure who 
handles Solr.
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848729#action_12848729
 ] 

Shai Erera commented on LUCENE-2339:


Mike, that's what I wrote above "if someone does not want to suppress, he 
should call close". I think that closeSafely (or as I prefer it - 
closeNoException) should be closed only when you know you've hit an exception 
and you want to close the stream suppressing any exceptions. Otherwise call 
close().

bq. can we add a boolean arg (suppressExceptions) to control that?

That would beat the purpose of the method no? I mean, currently it does not 
throw any exception, not even declaring one, and if we add that boolean it will 
need to declare "throws IOException", which will force the caller to try-catch 
that exception and ... suppress it or document "// cannot happen because I've 
passed false"?

So how about we call it closeNoException, document that it does not throw any 
exception and intentionally suppresses them, and if you don't want them to be 
suppressed, you can call io.close() yourself?

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2272) PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction'

2010-03-23 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-2272:
-

Attachment: PNQ-patch.txt

There is a bug in PayloadNearQuery. If there are multiple top level spans that 
match the query, only the payloads of the first one are retrieved. This patch 
fixes this bug by iterating over all the top level spans to get the payloads 
(see 'setFreqCurrentDoc')

> The base explain method can't be abstract. Something like
Ah, right. This is included in the patch

>The changes don't seem thread safe any more since there are now member 
>variables. It may still be all right, but have you looked at this aspect?

I guess that could be said about PayloadTermSpanScorer and 
PayloadNearSpanScorer, too (payloadScore, payloadsSeen). As for the 
PayloadFunction classes, they seem lightweight enough to be created with each 
query. Is there a better pattern?

Peter




> PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction'
> ---
>
> Key: LUCENE-2272
> URL: https://issues.apache.org/jira/browse/LUCENE-2272
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Reporter: Peter Keegan
>Assignee: Grant Ingersoll
> Attachments: payloadfunctin-patch.txt, PNQ-patch.txt
>
>
> The 'explain' method in PayloadNearSpanScorer assumes the 
> AveragePayloadFunction was used. This patch adds the 'explain' method to the 
> 'PayloadFunction' interface, where the Scorer can call it. Added unit tests 
> for 'explain' and for {Min,Max}PayloadFunction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Merge Status

2010-03-23 Thread Yonik Seeley
Of you have checkouts of the previous trunks that you don't want to
re-checkout, then use svn switch.

Solr trunk was moved to a 1.5 branch, so for old trunk checkouts, cd
into your directory and do
svn switch https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.5-dev

For "newtrunk" checkouts of combined lucene/solr:
svn switch  https://svn.apache.org/repos/asf/lucene/dev/trunk

For lucene only:
svn switch  https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene

-Yonik



On Tue, Mar 23, 2010 at 10:09 AM, Grant Ingersoll  wrote:
> Quick Status on where we are at:
>
> The new trunk for the merge is now open:  
> https://svn.apache.org/repos/asf/lucene/dev/trunk/  All committers should 
> have the same rights that they had before.
>
> Please do all development on those.
>
> The new mailing list has been requested: 
> https://issues.apache.org/jira/browse/INFRA-2567.  It will be named 
> lucene-solr-...@l.a.o.  All existing subscriptions should be automatically 
> moved, unless there is something about the mailing lists that I don't 
> understand and it isn't possible.
>
> All commits for the new trunk are going to be sent to: java-comm...@l.a.o.  I 
> went this route instead of asking for a new mailing list as it seemed easier 
> and it is a very low subscription level anyway.
>
> What's left:
>
> 1. Website updates - We should probably have a shared committer page.  I 
> guess we should just point the Solr "Who we are" page at the Lucene one and 
> add the delta between the two to the Lucene list.
>
> 2. Hudson?  Uwe?
>
> 3. Other nightly build stuff.  My cron tabs, etc.  I will update them to 
> point at the new trunk.
>
> Anything else?
>
> -Grant
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Merge Status

2010-03-23 Thread Grant Ingersoll
Quick Status on where we are at:

The new trunk for the merge is now open:  
https://svn.apache.org/repos/asf/lucene/dev/trunk/  All committers should have 
the same rights that they had before.

Please do all development on those.

The new mailing list has been requested: 
https://issues.apache.org/jira/browse/INFRA-2567.  It will be named 
lucene-solr-...@l.a.o.  All existing subscriptions should be automatically 
moved, unless there is something about the mailing lists that I don't 
understand and it isn't possible.

All commits for the new trunk are going to be sent to: java-comm...@l.a.o.  I 
went this route instead of asking for a new mailing list as it seemed easier 
and it is a very low subscription level anyway.

What's left:

1. Website updates - We should probably have a shared committer page.  I guess 
we should just point the Solr "Who we are" page at the Lucene one and add the 
delta between the two to the Lucene list.

2. Hudson?  Uwe?

3. Other nightly build stuff.  My cron tabs, etc.  I will update them to point 
at the new trunk.

Anything else?

-Grant
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1709) Parallelize Tests

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848718#action_12848718
 ] 

Michael McCandless commented on LUCENE-1709:


+1 for removing the flags and committing parallel tests for Lucene too.

> Parallelize Tests
> -
>
> Key: LUCENE-1709
> URL: https://issues.apache.org/jira/browse/LUCENE-1709
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
> Fix For: 3.1
>
> Attachments: LUCENE-1709.patch, runLuceneTests.py
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The Lucene tests can be parallelized to make for a faster testing system.  
> This task from ANT can be used: 
> http://ant.apache.org/manual/CoreTasks/parallel.html
> Previous discussion: 
> http://www.gossamer-threads.com/lists/lucene/java-dev/69669
> Notes from Mike M.:
> {quote}
> I'd love to see a clean solution here (the tests are embarrassingly
> parallelizable, and we all have machines with good concurrency these
> days)... I have a rather hacked up solution now, that uses
> "-Dtestpackage=XXX" to split the tests up.
> Ideally I would be able to say "use N threads" and it'd do the right
> thing... like the -j flag to make.
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1709) Parallelize Tests

2010-03-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848712#action_12848712
 ] 

Mark Miller commented on LUCENE-1709:
-

+1 on removing those flags - personally I find them unnecessary - and they 
complicate the build.

And I would love to Lucene parallel like Solr now.

> Parallelize Tests
> -
>
> Key: LUCENE-1709
> URL: https://issues.apache.org/jira/browse/LUCENE-1709
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
> Fix For: 3.1
>
> Attachments: LUCENE-1709.patch, runLuceneTests.py
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The Lucene tests can be parallelized to make for a faster testing system.  
> This task from ANT can be used: 
> http://ant.apache.org/manual/CoreTasks/parallel.html
> Previous discussion: 
> http://www.gossamer-threads.com/lists/lucene/java-dev/69669
> Notes from Mike M.:
> {quote}
> I'd love to see a clean solution here (the tests are embarrassingly
> parallelizable, and we all have machines with good concurrency these
> days)... I have a rather hacked up solution now, that uses
> "-Dtestpackage=XXX" to split the tests up.
> Ideally I would be able to say "use N threads" and it'd do the right
> thing... like the -j flag to make.
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1709) Parallelize Tests

2010-03-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848707#action_12848707
 ] 

Robert Muir commented on LUCENE-1709:
-

Thanks Jason.

So for newtrunk I applied a similar patch to speed up Solr's tests.
You can see it here: http://svn.apache.org/viewvc?rev=926470&view=rev

In this case the output is not interleaved because it uses a special formatter.
So it basically looks just like you are not using parallel at all.
Additionally -Dtestpackage, -Dtestpackageroot, -Dtestcase all work, the former 
two are also parallelized.

So, I propose we do the same thing for Lucene tests.
Solr was simple because it does not have these junit failed flag files.
I propose we just remove these, like how Solr does contrib.
Hudson hasn't failed in over a month by the way.


> Parallelize Tests
> -
>
> Key: LUCENE-1709
> URL: https://issues.apache.org/jira/browse/LUCENE-1709
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
> Fix For: 3.1
>
> Attachments: LUCENE-1709.patch, runLuceneTests.py
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The Lucene tests can be parallelized to make for a faster testing system.  
> This task from ANT can be used: 
> http://ant.apache.org/manual/CoreTasks/parallel.html
> Previous discussion: 
> http://www.gossamer-threads.com/lists/lucene/java-dev/69669
> Notes from Mike M.:
> {quote}
> I'd love to see a clean solution here (the tests are embarrassingly
> parallelizable, and we all have machines with good concurrency these
> days)... I have a rather hacked up solution now, that uses
> "-Dtestpackage=XXX" to split the tests up.
> Ideally I would be able to say "use N threads" and it'd do the right
> thing... like the -j flag to make.
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: New LuSolr trunk (was: RE: (LUCENE-2297) IndexWriter should let you optionally enable reader pooling)

2010-03-23 Thread Yonik Seeley
For Solr, we can just move the current trunk to a 15 branch.

-Yonik

On Tue, Mar 23, 2010 at 9:39 AM, Grant Ingersoll  wrote:
>
> On Mar 22, 2010, at 8:27 AM, Uwe Schindler wrote:
>
>> Hi all,
>>
>> the discussion where to do the development after the merge, now gets actual:
>>
>> Currently a lusolr test-trunk is done as a branch inside solr 
>> (https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk). The 
>> question is, where to put the main development and how to switch, so 
>> non-developers that have checkouts of solr and/or lucene will see the change 
>> and do not send us outdated patches.
>>
>> I propose to do the following:
>>
>> - Start a new top-level project folder inside /lucene root svn folder: 
>> https://svn.apache.org/repos/asf/lucene/lusolr (please see "lusolr" as a 
>> placeholder name) and add branches, tags subfolders to it. Do not create 
>> trunk and do this together with the next step.
>
> OK, I created https://svn.apache.org/repos/asf/lucene/dev/ and given 
> appropriate rights.  Uwe, you can now do the rest of the move.  Once you've 
> done it, let me know and I can make sure to add back the contrib rights.
>
>> - Move the branch from 
>> https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk to this new 
>> directory as "trunk"
>> - For lucene flexible indexing, create a corresponding flex branch there and 
>> svn copy it from current new trunk. Merge the lucene flex changes into it. 
>> Alternatively, land flex now. Or simply do svn copy of current flex branch 
>> instead of merging (may be less work).
>> - Do the same for possible solr branches in development
>> - Create a tag in the lucene tags folder and in the solr tags folder with 
>> the current state of each trunk. After that delete all contents from old 
>> trunk in solr and lucene and place a readme file pointing developers to the 
>> new merged trunk folder (for both old trunks). This last step is important, 
>> else people who checkout the old trunk will soon see a very outdated view 
>> and may send us outdated patches in JIRA. When the contents of old-trunk 
>> disappear it's obvious to them what happened. If they had already some 
>> changes in their checkout, the svn client will keep the changed files as 
>> unversioned (after upgrade). The history keeps available, so it's also 
>> possible to checkout an older version from trunk using @rev or -r rev. I did 
>> a similar step with some backwards compatibility changes in lucene (add a 
>> README).
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>>> -Original Message-
>>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>>> Sent: Monday, March 22, 2010 11:37 AM
>>> To: java-dev@lucene.apache.org
>>> Subject: Re: (LUCENE-2297) IndexWriter should let you optionally enable
>>> reader pooling
>>>
>>> I think we should.
>>>
>>> It (newtrunk) was created to test Hoss's side-by-sdie proposal, and
>>> that approach looks to be working very well.
>>>
>>> Up until now we've been committing to the old trunk and then
>>> systematically merging over to newtrunk.  I think we should now flip
>>> that, ie, commit to newtrunk and only merge back to the old trunk if
>>> for some strange reason it's needed.
>>>
>>> Mike
>>>
>>> On Mon, Mar 22, 2010 at 6:32 AM, Uwe Schindler  wrote:
 Are we now only working on newtrunk?

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de

> -Original Message-
> From: Michael McCandless (JIRA) [mailto:j...@apache.org]
> Sent: Monday, March 22, 2010 11:22 AM
> To: java-dev@lucene.apache.org
> Subject: [jira] Resolved: (LUCENE-2297) IndexWriter should let you
> optionally enable reader pooling
>
>
>     [ https://issues.apache.org/jira/browse/LUCENE-
> 2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-
>>> tabpanel
> ]
>
> Michael McCandless resolved LUCENE-2297.
> 
>
>    Resolution: Fixed
>
> Fixed on newtrunk.
>
>> IndexWriter should let you optionally enable reader pooling
>> ---
>>
>>                Key: LUCENE-2297
>>                URL: https://issues.apache.org/jira/browse/LUCENE-
> 2297
>>            Project: Lucene - Java
>>         Issue Type: Improvement
>>           Reporter: Michael McCandless
>>           Priority: Minor
>>            Fix For: 3.1
>>
>>        Attachments: LUCENE-2297.patch
>>
>>
>> For apps using a large index and frequently need to commit and
> resolve deletes, the cost of opening the SegmentReaders on demand
>>> for
> every commit can be prohibitive.
>> We an already pool readers (NRT does so), but, we only turn it on
>>> if
> NRT readers ar

Re: New LuSolr trunk (was: RE: (LUCENE-2297) IndexWriter should let you optionally enable reader pooling)

2010-03-23 Thread Grant Ingersoll

On Mar 22, 2010, at 8:27 AM, Uwe Schindler wrote:

> Hi all,
> 
> the discussion where to do the development after the merge, now gets actual:
> 
> Currently a lusolr test-trunk is done as a branch inside solr 
> (https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk). The 
> question is, where to put the main development and how to switch, so 
> non-developers that have checkouts of solr and/or lucene will see the change 
> and do not send us outdated patches.
> 
> I propose to do the following:
> 
> - Start a new top-level project folder inside /lucene root svn folder: 
> https://svn.apache.org/repos/asf/lucene/lusolr (please see "lusolr" as a 
> placeholder name) and add branches, tags subfolders to it. Do not create 
> trunk and do this together with the next step.

OK, I created https://svn.apache.org/repos/asf/lucene/dev/ and given 
appropriate rights.  Uwe, you can now do the rest of the move.  Once you've 
done it, let me know and I can make sure to add back the contrib rights.

> - Move the branch from 
> https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk to this new 
> directory as "trunk"
> - For lucene flexible indexing, create a corresponding flex branch there and 
> svn copy it from current new trunk. Merge the lucene flex changes into it. 
> Alternatively, land flex now. Or simply do svn copy of current flex branch 
> instead of merging (may be less work).
> - Do the same for possible solr branches in development
> - Create a tag in the lucene tags folder and in the solr tags folder with the 
> current state of each trunk. After that delete all contents from old trunk in 
> solr and lucene and place a readme file pointing developers to the new merged 
> trunk folder (for both old trunks). This last step is important, else people 
> who checkout the old trunk will soon see a very outdated view and may send us 
> outdated patches in JIRA. When the contents of old-trunk disappear it's 
> obvious to them what happened. If they had already some changes in their 
> checkout, the svn client will keep the changed files as unversioned (after 
> upgrade). The history keeps available, so it's also possible to checkout an 
> older version from trunk using @rev or -r rev. I did a similar step with some 
> backwards compatibility changes in lucene (add a README).
> 
> Uwe
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
>> -Original Message-
>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> Sent: Monday, March 22, 2010 11:37 AM
>> To: java-dev@lucene.apache.org
>> Subject: Re: (LUCENE-2297) IndexWriter should let you optionally enable
>> reader pooling
>> 
>> I think we should.
>> 
>> It (newtrunk) was created to test Hoss's side-by-sdie proposal, and
>> that approach looks to be working very well.
>> 
>> Up until now we've been committing to the old trunk and then
>> systematically merging over to newtrunk.  I think we should now flip
>> that, ie, commit to newtrunk and only merge back to the old trunk if
>> for some strange reason it's needed.
>> 
>> Mike
>> 
>> On Mon, Mar 22, 2010 at 6:32 AM, Uwe Schindler  wrote:
>>> Are we now only working on newtrunk?
>>> 
>>> -
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>> 
 -Original Message-
 From: Michael McCandless (JIRA) [mailto:j...@apache.org]
 Sent: Monday, March 22, 2010 11:22 AM
 To: java-dev@lucene.apache.org
 Subject: [jira] Resolved: (LUCENE-2297) IndexWriter should let you
 optionally enable reader pooling
 
 
 [ https://issues.apache.org/jira/browse/LUCENE-
 2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-
>> tabpanel
 ]
 
 Michael McCandless resolved LUCENE-2297.
 
 
Resolution: Fixed
 
 Fixed on newtrunk.
 
> IndexWriter should let you optionally enable reader pooling
> ---
> 
>Key: LUCENE-2297
>URL: https://issues.apache.org/jira/browse/LUCENE-
 2297
>Project: Lucene - Java
> Issue Type: Improvement
>   Reporter: Michael McCandless
>   Priority: Minor
>Fix For: 3.1
> 
>Attachments: LUCENE-2297.patch
> 
> 
> For apps using a large index and frequently need to commit and
 resolve deletes, the cost of opening the SegmentReaders on demand
>> for
 every commit can be prohibitive.
> We an already pool readers (NRT does so), but, we only turn it on
>> if
 NRT readers are in use.
> We should allow separate control.
> We should do this after LUCENE-2294.
 
 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue on

Re: New LuSolr trunk

2010-03-23 Thread Mark Miller

This looks good to me.

+1 on landing flex now.

On 03/22/2010 08:27 AM, Uwe Schindler wrote:

Hi all,

the discussion where to do the development after the merge, now gets actual:

Currently a lusolr test-trunk is done as a branch inside solr 
(https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk). The question 
is, where to put the main development and how to switch, so non-developers that 
have checkouts of solr and/or lucene will see the change and do not send us 
outdated patches.

I propose to do the following:

- Start a new top-level project folder inside /lucene root svn folder: 
https://svn.apache.org/repos/asf/lucene/lusolr (please see "lusolr" as a 
placeholder name) and add branches, tags subfolders to it. Do not create trunk and do 
this together with the next step.
- Move the branch from https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk to 
this new directory as "trunk"
- For lucene flexible indexing, create a corresponding flex branch there and 
svn copy it from current new trunk. Merge the lucene flex changes into it. 
Alternatively, land flex now. Or simply do svn copy of current flex branch 
instead of merging (may be less work).
- Do the same for possible solr branches in development
- Create a tag in the lucene tags folder and in the solr tags folder with the 
current state of each trunk. After that delete all contents from old trunk in 
solr and lucene and place a readme file pointing developers to the new merged 
trunk folder (for both old trunks). This last step is important, else people 
who checkout the old trunk will soon see a very outdated view and may send us 
outdated patches in JIRA. When the contents of old-trunk disappear it's obvious 
to them what happened. If they had already some changes in their checkout, the 
svn client will keep the changed files as unversioned (after upgrade). The 
history keeps available, so it's also possible to checkout an older version 
from trunk using @rev or -r rev. I did a similar step with some backwards 
compatibility changes in lucene (add a README).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


   

-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Monday, March 22, 2010 11:37 AM
To: java-dev@lucene.apache.org
Subject: Re: (LUCENE-2297) IndexWriter should let you optionally enable
reader pooling

I think we should.

It (newtrunk) was created to test Hoss's side-by-sdie proposal, and
that approach looks to be working very well.

Up until now we've been committing to the old trunk and then
systematically merging over to newtrunk.  I think we should now flip
that, ie, commit to newtrunk and only merge back to the old trunk if
for some strange reason it's needed.

Mike

On Mon, Mar 22, 2010 at 6:32 AM, Uwe Schindler  wrote:
 

Are we now only working on newtrunk?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

   

-Original Message-
From: Michael McCandless (JIRA) [mailto:j...@apache.org]
Sent: Monday, March 22, 2010 11:22 AM
To: java-dev@lucene.apache.org
Subject: [jira] Resolved: (LUCENE-2297) IndexWriter should let you
optionally enable reader pooling


  [ https://issues.apache.org/jira/browse/LUCENE-
2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-
 

tabpanel
 

]

Michael McCandless resolved LUCENE-2297.


 Resolution: Fixed

Fixed on newtrunk.

 

IndexWriter should let you optionally enable reader pooling
---

 Key: LUCENE-2297
 URL: https://issues.apache.org/jira/browse/LUCENE-
   

2297
 

 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2297.patch


For apps using a large index and frequently need to commit and
   

resolve deletes, the cost of opening the SegmentReaders on demand
 

for
 

every commit can be prohibitive.
 

We an already pool readers (NRT does so), but, we only turn it on
   

if
 

NRT readers are in use.
 

We should allow separate control.
We should do this after LUCENE-2294.
   

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



 

-
 

To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


 

Re: New LuSolr trunk (was: RE: (LUCENE-2297) IndexWriter should let you optionally enable reader pooling)

2010-03-23 Thread Grant Ingersoll

On Mar 22, 2010, at 8:27 AM, Uwe Schindler wrote:

> Hi all,
> 
> the discussion where to do the development after the merge, now gets actual:
> 
> Currently a lusolr test-trunk is done as a branch inside solr 
> (https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk). The 
> question is, where to put the main development and how to switch, so 
> non-developers that have checkouts of solr and/or lucene will see the change 
> and do not send us outdated patches.
> 
> I propose to do the following:
> 
> - Start a new top-level project folder inside /lucene root svn folder: 
> https://svn.apache.org/repos/asf/lucene/lusolr (please see "lusolr" as a 
> placeholder name) and add branches, tags subfolders to it. Do not create 
> trunk and do this together with the next step.
> - Move the branch from 
> https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk to this new 
> directory as "trunk"

OK, makes sense.  Frankly, I think we could just keep the name "java" for 
"lusolr", but "search" would work too or even something as simple as dev.

> - For lucene flexible indexing, create a corresponding flex branch there and 
> svn copy it from current new trunk. Merge the lucene flex changes into it. 
> Alternatively, land flex now. Or simply do svn copy of current flex branch 
> instead of merging (may be less work).
> - Do the same for possible solr branches in development
> - Create a tag in the lucene tags folder and in the solr tags folder with the 
> current state of each trunk. After that delete all contents from old trunk in 
> solr and lucene and place a readme file pointing developers to the new merged 
> trunk folder (for both old trunks). This last step is important, else people 
> who checkout the old trunk will soon see a very outdated view and may send us 
> outdated patches in JIRA. When the contents of old-trunk disappear it's 
> obvious to them what happened. If they had already some changes in their 
> checkout, the svn client will keep the changed files as unversioned (after 
> upgrade). The history keeps available, so it's also possible to checkout an 
> older version from trunk using @rev or -r rev. I did a similar step with some 
> backwards compatibility changes in lucene (add a README).

Makes sense.  We can always move things again if we need to.  This isn't CVS 
after all.
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848685#action_12848685
 ] 

Michael McCandless commented on LUCENE-2339:


Urgh... can we add a boolean arg (suppressExceptions) to control that?  
Because, if you did not hit an exception when copying, but then hit one when 
closing, we want to throw it in that case...



> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848678#action_12848678
 ] 

Earwin Burrfoot commented on LUCENE-2339:
-

Not right.
Imagine exception is thrown when copying, then I try to close the channels. If 
that close throws another exception, I either has to suppress it, or to throw 
and thus hide initial exception.

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2341) explore morfologik integration

2010-03-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848674#action_12848674
 ] 

Robert Muir commented on LUCENE-2341:
-

oh yeah, and also, what about cases with multiple solutions? does it emit 
multiple stems?


> explore morfologik integration
> --
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Reporter: Robert Muir
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option 
> for users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2341) explore morfologik integration

2010-03-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848672#action_12848672
 ] 

Robert Muir commented on LUCENE-2341:
-

Dawid, sounds good. I had a few questions, admittedly not having time to look 
at the code much:
* Does it support generation as well?
* Does it expose any morph attributes (e.g. POS) ?

> explore morfologik integration
> --
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Reporter: Robert Muir
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option 
> for users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848670#action_12848670
 ] 

Michael McCandless commented on LUCENE-2339:


bq. Let's mask it? That way the user may get the wrong exception, but he's not 
getting a situation when something failed but looks okay on the surface.

By "mask it" you mean hold onto the first exception you hit, continue closing & 
ignoring any further exceptions, then throw that first exception, right?

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Mailing List merge

2010-03-23 Thread Grant Ingersoll
https://issues.apache.org/jira/browse/INFRA-2567

On Mar 22, 2010, at 11:44 AM, Grant Ingersoll wrote:

> Shall we merge the dev mailing lists?  This should reduce the cross-posting 
> and can be completely automated (other than you may have to update your 
> client-side filters) and was part of the plan to merge dev efforts.
> 
> I'd propose it be called lucene-solr-...@l.a.o.  I can put in an issue for 
> infra@ to do it. 
> 
> -Grant



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2341) explore morfologik integration

2010-03-23 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848649#action_12848649
 ] 

Dawid Weiss commented on LUCENE-2341:
-

Robert, should I wait for Stempel patch first and then model this one after 
you? I'm thinking we can reuse most of the code; these stemmers have nearly 
identical APIs anyway.

> explore morfologik integration
> --
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Reporter: Robert Muir
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option 
> for users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2298) Polish Analyzer

2010-03-23 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848648#action_12848648
 ] 

Dawid Weiss commented on LUCENE-2298:
-

The dictionary's author states that:

"Attribution-sa to ta, na której jest udostępniany słownik."

so we can pick the CC-SA license that Apache supposedly permits in the 
repositories. This is good news. Switching to LUCENE-2341 then.

> Polish Analyzer
> ---
>
> Key: LUCENE-2298
> URL: https://issues.apache.org/jira/browse/LUCENE-2298
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/analyzers
>Affects Versions: 3.1
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1
>
> Attachments: LUCENE-2298.patch, stemmer_2.7z
>
>
> Andrzej Bialecki has written a Polish stemmer and provided stemming tables 
> for it under Apache License.
> You can read more about it here: http://www.getopt.org/stempel/
> In reality, the stemmer is general code and we could use it for more 
> languages too perhaps.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848636#action_12848636
 ] 

Shai Erera commented on LUCENE-2339:


I don't want to block the issue. If LUCENE-1482 will advance somewhere, we'll 
log a message in closeSafely. Otherwise between suppressing to always printing 
I agree we should suppress. If someone does not want to suppress he should call 
close(). Which makes me think we should call this method closeNoException 
because closeSafely is not exactly what it does :).

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848634#action_12848634
 ] 

Earwin Burrfoot commented on LUCENE-2339:
-

bq. So unless LUCENE-1482 springs back to life again, what do you suggest we 
do? Suppressing the exceptions seems wrong to me.
bq. But can we change it to throw the first exception it encounters? 
That's exactly what most of lucene is doing when closing something. If you 
can't log, you either suppress, or mask the previous exception.
Let's mask it? That way the user may get the wrong exception, but he's not 
getting a situation when something failed but looks okay on the surface.

bq. I love CloseSafely! We do that in a number of places and should simply call 
it, instead.
I did this for readers in my reopen patch, except new utility method does 
decRef.

bq. I also prefer Arrays.asList to be explicit
ok :/

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848632#action_12848632
 ] 

Michael McCandless commented on LUCENE-2328:


I think it's OK to make an exception to back-compat here.  Users who subclass 
FSDir, and also "borrow" SimpleFDDir's IndexOutput impl, are very advanced and 
can change their code.  The break will also be very clear -- compilation error, 
which you must fix to move on -- so we're not making a trap here.

Uwe are you OK with the rename?  I think it actually does make sense that it be 
in the base class...

> IndexWriter.synced  field accumulates data leading to a Memory Leak
> ---
>
> Key: LUCENE-2328
> URL: https://issues.apache.org/jira/browse/LUCENE-2328
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
> Environment: all
>Reporter: Gregor Kaczor
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
> LUCENE-2328.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I am running into a strange OutOfMemoryError. My small test application does
> index and delete some few files. This is repeated for 60k times. Optimization
> is run from every 2k times a file is indexed. Index size is 50KB. I did 
> analyze
> the HeapDumpFile and realized that IndexWriter.synced field occupied more than
> half of the heap. That field is a private HashSet without a getter. Its task 
> is
> to hold files which have been synced already.
> There are two calls to addAll and one call to add on synced but no remove or
> clear throughout the lifecycle of the IndexWriter instance.
> According to the Eclipse Memory Analyzer synced contains 32618 entries which
> look like file names "_e065_1.del" or "_e067.cfs"
> The index directory contains 10 files only.
> I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848631#action_12848631
 ] 

Michael McCandless commented on LUCENE-2339:


I love CloseSafely!  We do that in a number of places and should simply call 
it, instead.  But can we change it to throw the first exception it encounters?

I also prefer Arrays.asList to be explicit.

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field cache

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848628#action_12848628
 ] 

Michael McCandless commented on LUCENE-2335:


bq. I have used some time tinkering with the problem of spanning multiple 
segments and it seems to me that the generation of a "global" list of sorted 
ordinals should be feasible without too much overhead.

If you can this it'll be super-awesome :)  The holy grail of "sort by String"...

bq. As for facets, they are equivalent to sorting in the aspect that resolving 
the actual Strings can be delayed until the very end.

Ahh OK.

bq. I hope to have a patch out soon for SegmentReader so that it is possible to 
perform a sorted search "the Lucene way" rather than the hack I use in my proof 
of concept.

OK I'm looking forward to it!

bq. However, vacation starts friday...

Have a good vacation :)

> optimization: when sorting by field, if index has one segment and field 
> values are not needed, do not load String[] into field cache
> 
>
> Key: LUCENE-2335
> URL: https://issues.apache.org/jira/browse/LUCENE-2335
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
>
> Spinoff from java-dev thread "Sorting with little memory: A suggestion", 
> started by Toke Eskildsen.
> When sorting by SortField.STRING we currently ask FieldCache for a 
> StringIndex on that field.
> This can consumes tons of RAM, when the values are mostly unique (eg a title 
> field), as it populates both int[] ords as well as String[] values.
> But, if the index is only one segment, and the search sets fillFields=false, 
> we don't need the String[] values, just the int[] ords.  If the app needs to 
> show the fields it can pull them (for the 1 page) from stored fields.
> This can be a potent optimization -- alot of RAM saved -- for optimized 
> indexes.
> When fixing this we must take care to share the int[] ords if some queries do 
> fillFields=true and some =false... ie, FieldCache will be called twice and it 
> should share the int[] ords across those invocations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848626#action_12848626
 ] 

Michael McCandless commented on LUCENE-2329:


Sweet, this looks great Michael!  Less RAM used and faster indexing (much less 
GC load) -- win/win.

It's a little surprising that the segment count dropped from 41 -> 32?  Ie, how 
much less RAM do the parallel arrays take?  They save the object header 
per-unique-term, and 4 bytes on 64bit JREs since the "pointer" is now an int 
and not a real pointer?  But, other things use RAM (the docIDs in the postings 
themselves, norms, etc.) so it's surprising the savings was so much that you 
get 22% fewer segments... are you sure there isn't a bug in the RAM usage 
accounting?

> Use parallel arrays instead of PostingList objects
> --
>
> Key: LUCENE-2329
> URL: https://issues.apache.org/jira/browse/LUCENE-2329
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
> Attachments: lucene-2329.patch, lucene-2329.patch, lucene-2329.patch
>
>
> This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324.
> In order to avoid having very many long-living PostingList objects in 
> TermsHashPerField we want to switch to parallel arrays.  The termsHash will 
> simply be a int[] which maps each term to dense termIDs.
> All data that the PostingList classes currently hold will then we placed in 
> parallel arrays, where the termID is the index into the arrays.  This will 
> avoid the need for object pooling, will remove the overhead of object 
> initialization and garbage collection.  Especially garbage collection should 
> benefit significantly when the JVM runs out of memory, because in such a 
> situation the gc mark times can get very long if there is a big number of 
> long-living objects in memory.
> Another benefit could be to build more efficient TermVectors.  We could avoid 
> the need of having to store the term string per document in the TermVector.  
> Instead we could just store the segment-wide termIDs.  This would reduce the 
> size and also make it easier to implement efficient algorithms that use 
> TermVectors, because no term mapping across documents in a segment would be 
> necessary.  Though this improvement we can make with a separate jira issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848606#action_12848606
 ] 

Shai Erera commented on LUCENE-2339:


Sorry ... I was confused w/ the for loop of Java 5 :). Let's keep it Collection 
then. Sorry for the hassle.

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848591#action_12848591
 ] 

Uwe Schindler edited comment on LUCENE-2339 at 3/23/10 7:17 AM:


bq. I just wanted to avoid converting arrays to a Collection, just so that they 
can be iterated on. 

Sorry, for the dumb question: In which JDK do arrays implement Iterable? From 
my knowledge and a quick check with Java 5, it does not. Passing an array to a 
method taking Iterable does not work. Arrays only work in extended for 
statement, but not because they are Itearble. The generated code by javac is 
also totally different (and more effective than creating an iterator, it just 
uses the conventional for(i=0; ihttp://stackoverflow.com/questions/1160081/why-is-an-array-not-assignable-to-iterable]
- [http://72.5.124.102/thread.jspa?threadID=558036&tstart=607]

And where is the waste of calling Arrays.asList()? This is exactly the same 
overhead like creating an iterator() if arrays were Iterable, both are just 
"views" on the array, so no copy involved.

  was (Author: thetaphi):
bq. I just wanted to avoid converting arrays to a Collection, just so that 
they can be iterated on. 

Sorry, for the dumb question: In which JDK do arrays implement Iterable? From 
my knowledge and a quick check with Java 5, it does not. Passing an array to a 
method taking Iterable does not work. Arrays only work in extended for 
statement, but not because they are Itearble. The generated code by javac is 
also totally different (and more effective than creating an iterator, it just 
uses the conventional for(i=0; ihttp://stackoverflow.com/questions/1160081/why-is-an-array-not-assignable-to-iterable]
- [http://72.5.124.102/thread.jspa?threadID=558036&tstart=607]
  
> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848591#action_12848591
 ] 

Uwe Schindler commented on LUCENE-2339:
---

bq. I just wanted to avoid converting arrays to a Collection, just so that they 
can be iterated on. 

Sorry, for the dumb question: In which JDK do arrays implement Iterable? From 
my knowledge and a quick check with Java 5, it does not. Passing an array to a 
method taking Iterable does not work. Arrays only work in extended for 
statement, but not because they are Itearble. The generated code by javac is 
also totally different (and more effective than creating an iterator, it just 
uses the conventional for(i=0; ihttp://stackoverflow.com/questions/1160081/why-is-an-array-not-assignable-to-iterable]
- [http://72.5.124.102/thread.jspa?threadID=558036&tstart=607]

> Allow Directory.copy() to accept a collection of file names to be copied
> 
>
> Key: LUCENE-2339
> URL: https://issues.apache.org/jira/browse/LUCENE-2339
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Michael McCandless
> Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch
>
>
> Par example, I want to copy files pertaining to a certain commit, and not 
> everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field cache

2010-03-23 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848589#action_12848589
 ] 

Toke Eskildsen commented on LUCENE-2335:


I can see that I messed up reading your previous answer, regarding stored 
fields. Let's just forget is as to not confuse the issue further.

As for facets, they are equivalent to sorting in the aspect that resolving the 
actual Strings can be delayed until the very end. I'll try and contain myself 
on the facet subject and focus on sorting though.

I have used some time tinkering with the problem of spanning multiple segments 
and it seems to me that the generation of a "global" list of sorted ordinals 
should be feasible without too much overhead. Basically we want to preserve 
sequential access as much as possible, so merging sorted ordinals from segments 
will benefit from a read-ahead cache. By letting the reader deliver ordinals by 
an iterater, it is free to implement such a cache when necessary. I envision 
the signature to be something like
{code}
Iterator getOrdinalTerms(
  String persistenceKey, Comparator comparator, String field,
  boolean collectDocIDs) throws IOException;
{code}
where OrdinalTerm contains ordinal, Term and docID.

The beauty of all this is that the mapping is from docID->sortedOrdinal index 
(which it has to be for fast comparison), so keeping the possibility of 
resolving the Strings after the sort (fillFields=true) is free in terms of 
storage space and processing time.

I hope to have a patch out soon for SegmentReader so that it is possible to 
perform a sorted search "the Lucene way" rather than the hack I use in my proof 
of concept. However, vacation starts friday...

> optimization: when sorting by field, if index has one segment and field 
> values are not needed, do not load String[] into field cache
> 
>
> Key: LUCENE-2335
> URL: https://issues.apache.org/jira/browse/LUCENE-2335
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
>
> Spinoff from java-dev thread "Sorting with little memory: A suggestion", 
> started by Toke Eskildsen.
> When sorting by SortField.STRING we currently ask FieldCache for a 
> StringIndex on that field.
> This can consumes tons of RAM, when the values are mostly unique (eg a title 
> field), as it populates both int[] ords as well as String[] values.
> But, if the index is only one segment, and the search sets fillFields=false, 
> we don't need the String[] values, just the int[] ords.  If the app needs to 
> show the fields it can pull them (for the 1 page) from stored fields.
> This can be a potent optimization -- alot of RAM saved -- for optimized 
> indexes.
> When fixing this we must take care to share the int[] ords if some queries do 
> fillFields=true and some =false... ie, FieldCache will be called twice and it 
> should share the int[] ords across those invocations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org