[jira] Closed: (LUCENENET-384) QueryParsers exception on Windows 2008 Server

2011-01-15 Thread Digy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy closed LUCENENET-384.
--

Resolution: Cannot Reproduce

Hi   Rida,
If you still have the same bug(with recent versions), you can open a new issue 
with more info.
I'm closing this one.

DIGY

 QueryParsers exception on Windows 2008 Server
 -

 Key: LUCENENET-384
 URL: https://issues.apache.org/jira/browse/LUCENENET-384
 Project: Lucene.Net
  Issue Type: Bug
 Environment: Lucene.Net 2.0.0.4
 OS: Windows 2008 Server / 32bit
Reporter: Rida Al-Masri
Priority: Blocker

 I have developed an application that use Lucene.Net 2.0.0.4 and it works very 
 well on Widows XP and Windows 2003 Server, but when I tried to use this 
 application on Windows 2008 Server  / 32bit it raises 
 Lucene.Net.QueryParsers.ParseException for all the supplied queries,
 Your attention to this issue is highly appreciated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (LUCENENET-375) Getting assert in SegmentReader.cs (Lucene.net_2_9_2)

2011-01-15 Thread Digy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy resolved LUCENENET-375.


Resolution: Fixed

Patch applied to 2.9.2 branch and trunk.

DIGY

 Getting assert in SegmentReader.cs (Lucene.net_2_9_2)
 -

 Key: LUCENENET-375
 URL: https://issues.apache.org/jira/browse/LUCENENET-375
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Digy
 Attachments: SegmentReader.patch


 Reported by *Patrick Ng* and  *Kevin Miller*  in mailing lists.
 No feedback  yet :(   
 Reason: Java version of Norm.Clone is implemented in an synchronized function 
 but it is somehow omitted in Lucene.Net.
 DIGY

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (LUCENENET-376) Ver.2.9.2 SpanOrQuery.ToString() bug

2011-01-15 Thread Digy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy resolved LUCENENET-376.


Resolution: Fixed
  Assignee: Digy

Patch applied to 2.9.2 and trunk

DIGY

 Ver.2.9.2 SpanOrQuery.ToString() bug
 

 Key: LUCENENET-376
 URL: https://issues.apache.org/jira/browse/LUCENENET-376
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Andrei Iliev
Assignee: Digy
 Attachments: SpanOrQuery.patch


 Bad conversion from java code.
 
   System.Collections.IEnumerator i = 
 clauses.GetEnumerator();
   while (i.MoveNext())
   {
   SpanQuery clause = (SpanQuery) i.Current;
   buffer.Append(clause.ToString(field));
   if (i.MoveNext())
   {
   buffer.Append(, );
   }
   }
 
 Shoud be changed to something like:
 
 int j = 0;
   while (i.MoveNext())
   {
  j++;
   SpanQuery clause = (SpanQuery) i.Current;
   buffer.Append(clause.ToString(field));
   if (jclauses.Count)
   {
   buffer.Append(, );
   }
   }
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-2315) analysis.jsp highlight matches no longer works

2011-01-15 Thread Pradeep (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982044#action_12982044
 ] 

Pradeep commented on SOLR-2315:
---

I have one week old code. It works for me.

 analysis.jsp highlight matches no longer works
 

 Key: SOLR-2315
 URL: https://issues.apache.org/jira/browse/SOLR-2315
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Hoss Man
 Fix For: 3.1, 4.0


 As noted by Teruhiko Kurosaka on the mailing list, at some point since Solr 
 1.4, highlight matches stoped working on the analysis.jsp  -- on both the 
 3x and trunk branches

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr-trunk - Build # 1374 - Failure

2011-01-15 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Solr-trunk/1374/

All tests passed

Build Log (for compile errors):
[...truncated 19409 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982057#action_12982057
 ] 

Uwe Schindler commented on LUCENE-2858:
---

Any comments about removing write access from IndexReaders? I think setNorms() 
will be removed soo, but how about the others? I would propose to also make all 
IndexReaders simply *readers* not writers?

 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982057#action_12982057
 ] 

Uwe Schindler edited comment on LUCENE-2858 at 1/15/11 5:26 AM:


Any comments about removing write access from IndexReaders? I think setNorms() 
will be removed soon, but how about the others like deleteDocument()? I would 
propose to also make all IndexReaders simply *readers* not writers?

  was (Author: thetaphi):
Any comments about removing write access from IndexReaders? I think 
setNorms() will be removed soo, but how about the others? I would propose to 
also make all IndexReaders simply *readers* not writers?
  
 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering

2011-01-15 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982061#action_12982061
 ] 

Dawid Weiss commented on SOLR-2282:
---

I think I nailed it. I did whitebox-inspect Carrot2 code and thought it 
impossible for a concurrency bug to creep in (in particular with a simple 
controller), but what we didn't take into account is that Carrot2 
infrastructure itself allows a scenario in which a single object instance is 
bound to multiple components at runtime (and is then effectively shared in a 
multi threaded context). This code happens to be in Solr's code base, not in 
Carrot2. The bug happens because of the following series of events:

1) The controller in Solr itself is initialized with a single instance of new 
LuceneLanguageModelFactory() -- this factory is then injected into all 
components at runtime.
2) The base class of LuceneLanguageModelFactory is DefaultLanguageModelFactory 
which has an object-local cache of stemmers and tokenizers. In Carrot2 3.4.2, 
factories are component-bound anyway, so a factory can reuse its resources. In 
the trunk version, this is no longer the case (factories simply create new 
objects as they are requested).
3) Because of the tokenizers/stemmers cache, tokenizers and stemmers can be 
used in parallel when two requests are made at the same time. I think this 
should be fairly repeatable on all computers, regardless of the number of 
cores/speed, it's just a matter of time. Clustering is relatively longer than 
tokenization, so for two tokenizations to overlap (and screw up internal data 
structures) is a rare event (and yet, as we could see, frequent enough to 
manifest itself during tests).

{noformat}
// Customize the language model factory. The implementation we provide here
// is included in the code base of Solr, so that it's possible to refactor
// the Lucene APIs the factory relies on if needed.
initAttributes.put(PreprocessingPipeline.languageModelFactory,
  new LuceneLanguageModelFactory());
this.controller.init(initAttributes);
{noformat}

The fix for the problem would be to:

1) upgrade to trunk/future Carrot2 version (because of different memory 
management in factories),
2) pass a class instead of an instance to the initialization parameters. So 
this should do:

{noformat}
// Customize the language model factory. The implementation we provide here
// is included in the code base of Solr, so that it's possible to refactor
// the Lucene APIs the factory relies on if needed.
initAttributes.put(PreprocessingPipeline.languageModelFactory,
  LuceneLanguageModelFactory.class);
this.controller.init(initAttributes);
{noformat}

Works on my machine :) But I'll let Staszek review this again so that we're 
sure it's really this.



 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, 
 SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, 
 SOLR-2282_test.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering

2011-01-15 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982062#action_12982062
 ] 

Dawid Weiss commented on SOLR-2282:
---

One more side comment for those interested. I used my favorite technique for 
debugging such things -- created another project in Eclipse (AspectJ-enabled), 
created a runtime weaving launch config in Eclipse that started that particular 
test, wrote this aspect:

{noformat}
package com.carrotsearch.aspects;

import java.util.HashMap;

/**
 * Check for multithreaded access in supposedly single-threaded objects.
 */
public aspect Solr2282
{
pointcut guardedMethods() :
execution(* 
org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.*(..));

private HashMapObject, Thread t = new HashMapObject, Thread();

Object around() : guardedMethods()
{
Object tokenizer = thisJoinPoint.getThis();
Thread current = Thread.currentThread();
try {
synchronized (Solr2282.class) {
Thread owner = t.get(tokenizer);
if (owner != null  owner != current)
halt();
t.put(tokenizer, current);
}

return proceed();
} catch (Throwable e) {
halt();
return null;
} finally {
synchronized (Solr2282.class) {
Thread owner = t.get(tokenizer);
if (owner != null  owner != current)
halt();
t.remove(tokenizer);
}
}
}

private void halt()
{
System.out.println(## HALT! );
}
}
{noformat}

and placed a VM-halting breakpoint in sysout inside halt()... Once I got two 
threads running on the same tokenizer instance, it was a matter of inspecting 
which objects are shared and how this could possibly happen. 

Aspect-oriented programming never really won me, but as a debugging/ 
performance analysis tool it simply rocks.

 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, 
 SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, 
 SOLR-2282_test.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release schedule Lucene 4?

2011-01-15 Thread Michael McCandless
This is unfortunately hard to say!

There's tons of good stuff in 4.0, so we'd really like to release
sooner rather than later.

But then there's also alot of work remaining, eg we have 3 feature
branches in flight right now, that we need to wrap up and land on
trunk:

  * realtime (gives us concurrent flushing during indexing)

  * docvalues (adds column-stride fields)

  * bulkpostings (gives good search speedup for intblock codecs)

Plus many open Jira issues.  So it's hard to predict when all of this
will be done

Mike

On Fri, Jan 14, 2011 at 12:31 PM, Gregor Heinrich gre...@arbylon.net wrote:
 Dear Lucene team,

 I am wondering whether there is an updated Lucene release schedule for the
 v4.0 stream.

 Any earliest/latest alpha/beta/stable date? And if not yet, where to track
 such info?

 Thanks in advance from Germany

 gregor

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: CorruptIndexException when indexing

2011-01-15 Thread Michael McCandless
Different ramBufferSizeMB during indexing should never cause corruption!

Can you try setting the ram buffer to 256 MB in your test env and see
if that makes the corruption go away?

This could also be a hardware issue in your test env.  If you run
CheckIndex on the corrupt index does it always fail in the same way?

Mike

On Fri, Jan 14, 2011 at 6:43 AM, Li Li fancye...@gmail.com wrote:
 hi all,
   we have confronted this problem 3 times when testing
   The exception stack is
 Exception in thread Lucene Merge Thread #2
 org.apache.lucene.index.MergePolicy$MergeException:
 org.apache.lucene.index.CorruptIndexException: docs out of order (7286
 = 7286 )
        at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:355)
        at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:319)
 Caused by: org.apache.lucene.index.CorruptIndexException: docs out of
 order (7286 = 7286 )
        at 
 org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:75)
        at 
 org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:880)
        at 
 org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:818)
        at 
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:756)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:187)
        at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5354)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4937)

    Or
 Exception in thread Lucene Merge Thread #0
 org.apache.lucene.index.MergePolicy$MergeException:
 java.lang.ArrayIndexOutOfBoundsException: 330
        at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:355)
        at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:319)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 330
        at org.apache.lucene.util.BitVector.get(BitVector.java:102)
        at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:238)
        at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:168)
        at 
 org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
        at 
 org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:870)
        at 
 org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:818)
        at 
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:756)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:187)
        at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5354)


   We did some minor modification based on lucene 2.9.1 and solr
 1.4.0. we modified frq file to store 4 bytes for the positions of the
 term occured
 in these document(Accessing full postions in prx is time consuming
 that can't meed our needs). I can't tell it's our bug or lucene's own
 bug.
   I searched the mail list and found the mail problem during index
 merge posted in 2010-10-21. It's similar to our case.
   It seems the docList in frq file is wrongly stored. When Merging,
 when it's decoded, the wrong docID many larger than maxDocs(BitVector
 deletedDocs)
 which cause the second exception. Or docID delta is less than 0(it
 reads wrongly) which cause the first exception
   we are still continueing testing to turn off our modification and
 open infoStream in solr-config.xml

   We found a strange phenomenon. when we test, it sometimes hited
 exceptions but in our production environment, it never hit any.
   the hardware and software environments are the same. We checked
 carefully and find the only difference is this line in solr-config.xml
  ramBufferSizeMB32/ramBufferSizeMB  in testing environment
  ramBufferSizeMB256/ramBufferSizeMBin production environment
  The indexed documents number for each machine is also roughly the
 same. 10M+ documents.
  I can't make sure the indice in production env are correct because
 even there are some terms' docList are wrong, if the doc delta 0  and
 don't have
 some deleted documents, it will not hit the 2 exceptions.
  The search results in production env and we don't find any strange results.

  Will when  the ramBufferSizeMB is too small results in index corruption?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982102#action_12982102
 ] 

Robert Muir commented on LUCENE-2858:
-

bq. I think setNorms() will be removed soon

Why do you think this?

On the norms cleanup issue, i only removed setNorm(float), because its 
completely useless.
All it did was call Similarity.getDefault().encode(float) + setNorm(byte).


 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2282) Distributed Support for Search Result Clustering

2011-01-15 Thread Stanislaw Osinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislaw Osinski updated SOLR-2282:


Attachment: SOLR-2282-concurrency-branch_3x.patch
SOLR-2282-concurrency-trunk.patch

Thanks for debugging this, Dawid! I think solution 2) you suggested would be 
the best because it applies both to version 3.4.2 of Carrot2 (currently used by 
Solr) and the 3.5.0 version (not yet released).

I'm attaching patches for Solr trunk and branch_3x that fix the concurrency 
issue and correct a typo in a log message output by 
{{LuceneLanguageModelFactory}}.

 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282-concurrency-branch_3x.patch, 
 SOLR-2282-concurrency-trunk.patch, SOLR-2282-diagnostics.patch, 
 SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, 
 SOLR-2282.patch, SOLR-2282_test.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2862) Track total term freq per term

2011-01-15 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2862.


Resolution: Fixed

 Track total term freq per term
 --

 Key: LUCENE-2862
 URL: https://issues.apache.org/jira/browse/LUCENE-2862
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2862.patch


 Right now we track docFreq for each term (how many docs have the
 term), but the totalTermFreq (total number of occurrences of this
 term, ie sum of freq() for each doc that has the term) is also a
 useful stat (for flex scoring, PulsingCodec, etc.).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982107#action_12982107
 ] 

Uwe Schindler commented on LUCENE-2858:
---

I was talking about replacing norms by CSF, maybe It's just not soon.

 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982115#action_12982115
 ] 

Robert Muir commented on LUCENE-2858:
-

Ah, ok. sorry i was confused. Still, i think we would need this method 
(somewhere) even 
with CSF, so that people can change the norms and they instantly take effect 
for searches.


 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release schedule Lucene 4?

2011-01-15 Thread Shai Erera
Well … we can decide on a list of features we want in 4.0 (e.g., fhe 3
you mention above), estimate the time it would take to finish them and
then give a release date(s). That will get is faster to a release than
if we wait for all JIRA issues to end + the separate branches we work
on.

We should also decide on a release date for 3x.

And as usual, we should release more often than we do today :).

Shai

On Saturday, January 15, 2011, Michael McCandless
luc...@mikemccandless.com wrote:
 This is unfortunately hard to say!

 There's tons of good stuff in 4.0, so we'd really like to release
 sooner rather than later.

 But then there's also alot of work remaining, eg we have 3 feature
 branches in flight right now, that we need to wrap up and land on
 trunk:

   * realtime (gives us concurrent flushing during indexing)

   * docvalues (adds column-stride fields)

   * bulkpostings (gives good search speedup for intblock codecs)

 Plus many open Jira issues.  So it's hard to predict when all of this
 will be done

 Mike

 On Fri, Jan 14, 2011 at 12:31 PM, Gregor Heinrich gre...@arbylon.net wrote:
 Dear Lucene team,

 I am wondering whether there is an updated Lucene release schedule for the
 v4.0 stream.

 Any earliest/latest alpha/beta/stable date? And if not yet, where to track
 such info?

 Thanks in advance from Germany

 gregor

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982126#action_12982126
 ] 

Earwin Burrfoot commented on LUCENE-2858:
-

bq. Any comments about removing write access from IndexReaders? I think 
setNorms() will be removed soon, but how about the others like 
deleteDocument()? I would propose to also make all IndexReaders simply readers 
not writers? 

Voting with all my extremities - yes!!

 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982132#action_12982132
 ] 

Earwin Burrfoot commented on LUCENE-2858:
-

bq. Still, i think we would need this method (somewhere) even with CSF, so that 
people can change the norms and they instantly take effect for searches.
This still puzzles me. I can strain my imagination, and get people who just 
need to change norms without reindexing.
But doing this and *requiring* instant turnaround? Kid me not :)


 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982134#action_12982134
 ] 

Michael McCandless commented on LUCENE-2858:


I don't think we should remove setNorm/deleteDocuments, even from the composite 
reader class.

Deleting docs from IR has advantages over deleting from IW: the change is 
live to any searches running on that IR; you get an immediate count of how 
many docs were deleted; you can delete by docID.

setNorm is also useful in that it can be use to boost docs (globally), live, if 
that reader is being used for searching.  When/if we cutover norms - doc 
values we'll have to decide what to do about setNorm...

At a higher level, for this strong typing of atomic vs composite IRs, we 
shouldn't try to change functionality -- let's just do a rote refactoring, such 
that methods that now throw UOE on IR are moved to the atomic reader only.

Separately we can think about whether existing functions should be dropped...

 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2657:


Attachment: LUCENE-2657.patch

This patch cuts over {{ant generate-maven-artifacts}} to directly use {{mvn}}.  
After applying this patch, Maven 2.2.1 (and maybe 3.0.X? - untested) must be 
installed on your machine in order to run {{ant generate-maven-artifacts}}.

Other changes in this patch:

# Dropped all use of Maven Ant Tasks.
# The top-level {{ant generate-maven-artifacts}} now works and is the best way 
to perform this task, since it will create a single timestamp for all 
artifacts; this target can also be run from either {{solr/}} or {{lucene/}}.
# Removed the {{generate-maven-artifacts}} target from {{modules/build.xml}}, 
and transferred the responsibility for generating {{modules/*}} maven artifacts 
to {{lucene/build.xml}}.
# The {{solr/src/webapp/}} module no longer installs or deploys its (empty) 
sources jar.
# Remote Maven artifact deployment is no longer included in the Ant build - 
this can be performed by the Maven build.
# {{mvn clean}} no longer removes {{solr/dist/}} or {{lucene/dist/}}, for two 
reasons:
## The Ant build populates {{dist/}} with things that the Maven build should 
not remove.  Removing just {{dist/maven/}} won't work, because:
## I couldn't find a nice/simple way to remove a directory just once in the 
reactor build.  The previous patch attempted to do this from the lucene core 
and solr core modules, but that solution was deleting their deployed parent 
POMs, since the reactor build orders the Solr and Lucene parent POMs before the 
core modules (the parent relationship requires this).

Left to do:
# Add Ant targets to test the Maven artifacts
# Backport to branch_3x


 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982152#action_12982152
 ] 

Marvin Humphrey commented on LUCENE-2858:
-

 Deleting docs from IR has advantages over deleting from IW: the change is
 live to any searches running on that IR; you get an immediate count of how
 many docs were deleted; you can delete by docID.

Alternate plan:

  * Move responsibility for deletions to a pluggable DeletionsReader
subcomponent of SegmentReader.
  * Have the default DeletionsReader be read-only.
  * People who need the esoteric functionality described above can use a
subclass of DeletionsReader.


 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982166#action_12982166
 ] 

Earwin Burrfoot commented on LUCENE-2858:
-

APIs have to be there still. All that commity, segment-deletery, mutabley stuff 
(that spans both atomic and composite readers).
So, while your plan is viable, it won't remove that much cruft.

 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-3.x - Build # 243 - Still Failing

2011-01-15 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/243/

All tests passed

Build Log (for compile errors):
[...truncated 21065 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2282) Distributed Support for Search Result Clustering

2011-01-15 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2282.
--

Resolution: Fixed

Thanks everyone!

trunk: Committed revision 1059426.
3x: Committed revision 1059428.


 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282-concurrency-branch_3x.patch, 
 SOLR-2282-concurrency-trunk.patch, SOLR-2282-diagnostics.patch, 
 SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, 
 SOLR-2282.patch, SOLR-2282_test.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1191) NullPointerException in delta import

2011-01-15 Thread Gunnlaugur Thor Briem (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982193#action_12982193
 ] 

Gunnlaugur Thor Briem commented on SOLR-1191:
-

I see this still happening on the tip of the 3.1 branch:

{quote}
Jan 15, 2011 9:47:39 PM org.apache.solr.handler.dataimport.DataImporter 
doDeltaImport
SEVERE: Delta Import Failed
java.lang.NullPointerException
at 
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:860)
at 
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:282)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:176)
at 
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:356)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:374)
{quote}

Same kind of problem:

{quote}
 pk=ds.id
 deltaQuery=SELECT id FROM [...]
{quote}

and same kind of workaround:

{quote}
 pk=ds.id
 deltaQuery=SELECT id AS quot;ds.idquot;FROM [...]
{quote}


 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-15 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982200#action_12982200
 ] 

Michael Busch commented on LUCENE-2324:
---

I just committed fixes for some failing tests.  
Eg. the addIndexes() problem is now fixed.  The problem was that I had 
accidentally removed the following line in DW.addIndexes():

{code}
 // Update SI appropriately
 info.setDocStore(info.getDocStoreOffset(), newDsName, 
info.getDocStoreIsCompoundFile());
{code}

info.setDocStore() calls clearFiles(), which empties a SegmentInfo-local cache 
of all filenames that belong to the corresponding segment.  Since addIndexes() 
changes the segment name, it is important to refill that cache with the new 
file names.

This was a sneaky bug.  We should probably call clearFiles() explicitly there 
in addIndexes().  For now I added a comment.

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, 
 test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2867) Change contrib QP API that uses CharSequence as string identifier

2011-01-15 Thread Adriano Crestani (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano Crestani resolved LUCENE-2867.
--

Resolution: Fixed
  Assignee: Adriano Crestani

Thanks for reviewing the patch Simon!

The patch was applied on revision 1059436

 Change contrib QP API that uses CharSequence as string identifier
 -

 Key: LUCENE-2867
 URL: https://issues.apache.org/jira/browse/LUCENE-2867
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 3.0.3
Reporter: Adriano Crestani
Assignee: Adriano Crestani
Priority: Minor
 Fix For: 3.0.4

 Attachments: lucene_2867_adriano_crestani_2011_01_13.patch


 There are some API methods on contrib queryparser that expects CharSequence 
 as identifier. This is wrong, since it may lead to incorrect or mislead 
 behavior, as shown on LUCENE-2855. To avoid this problem, these APIs will be 
 changed and enforce the use of String instead of CharSequence on version 4. 
 This patch already deprecate the old API methods and add new substitute 
 methods that uses only String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-15 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2324:
-

Attachment: test.out

Are you also merging trunk in as svn up yields a lot of updates.

There are new test failures in: TestSnapshotDeletionPolicy

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, 
 test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-15 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982205#action_12982205
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

The TestStressIndexing2 errors remind me of what I saw when working on 
LUCENE-2680.  I'll take a look.  They weren't there in the previous revisions 
of this branch.

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, 
 test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-trunk - Build # 1427 - Still Failing

2011-01-15 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1427/

All tests passed

Build Log (for compile errors):
[...truncated 16762 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-15 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982213#action_12982213
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

In DW.flushAllThreads we're accessing indexWriter.segmentInfos while we're not 
synced on IW, so the segment infos vector may be changing as we're accessing 
it.  I'm not sure how we can reasonably solve this, I don't think cloning 
segment infos will work.  In trunk, doFlush is sync'ed on IW and so doesn't run 
into these problems.  Perhaps for the flush all threads case we should simply 
sync on IW?

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, 
 test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1191) NullPointerException in delta import

2011-01-15 Thread Gunnlaugur Thor Briem (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunnlaugur Thor Briem updated SOLR-1191:


Attachment: SOLR-1191.patch

Patch to resolve this. It resolves deltaQuery columns against pk when they 
differ by prefix (and report error more helpfully when no column matches, or 
more than one column matches).

No unit test, sorry (but there's not much deltaQuery coverage anyway). All 
existing unit tests pass, and this is working fine for me in production.

 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: user
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987716 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 7
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed 

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-15 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982220#action_12982220
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

DW.deleteTerms iterates on DWPTs without acquiring the ThreadState.lock, 
instead DWPT.deleteTerms is synced (on DWPT).  I think if a flush is occurring 
then deletes can get in at the same time?  I don't think BufferedDeletes 
supports that?

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, 
 test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1191) NullPointerException in delta import

2011-01-15 Thread Gunnlaugur Thor Briem (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982221#action_12982221
 ] 

Gunnlaugur Thor Briem commented on SOLR-1191:
-

Neglected to mention: that patch is against branch_3x.

 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: user
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987716 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 7
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: user rows obtained : 46
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: user rows obtained : 0
 05/27 11:59:29 86987873 INFO  Thread-4162