date:20110413

[jira] [Updated] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767

2011-04-13 Thread wangzhenghang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangzhenghang updated LUCENE-3026:
--

Attachment: LUCENE-3026.patch

> smartcn analyzer throw NullPointer exception when the length of analysed text 
> over 32767
> 
>
> Key: LUCENE-3026
> URL: https://issues.apache.org/jira/browse/LUCENE-3026
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/analyzers
>Affects Versions: 3.1, 4.0
>Reporter: wangzhenghang
> Attachments: LUCENE-3026.patch
>
>
> That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
> makeIndex() method:
>   public List makeIndex() {
> List result = new ArrayList();
> int s = -1, count = 0, size = tokenListTable.size();
> List tokenList;
> short index = 0;
> while (count < size) {
>   if (isStartExist(s)) {
> tokenList = tokenListTable.get(s);
> for (SegToken st : tokenList) {
>   st.index = index;
>   result.add(st);
>   index++;
> }
> count++;
>   }
>   s++;
> }
> return result;
>   }
> here 'short index = 0;' should be 'int index = 0;'. And that's reported here 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
> author XiaoPingGao have already fixed this 
> bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767

2011-04-13 Thread wangzhenghang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019671#comment-13019671
 ] 

wangzhenghang commented on LUCENE-3026:
---

It's done

> smartcn analyzer throw NullPointer exception when the length of analysed text 
> over 32767
> 
>
> Key: LUCENE-3026
> URL: https://issues.apache.org/jira/browse/LUCENE-3026
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/analyzers
>Affects Versions: 3.1, 4.0
>Reporter: wangzhenghang
> Attachments: LUCENE-3026.patch
>
>
> That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
> makeIndex() method:
>   public List makeIndex() {
> List result = new ArrayList();
> int s = -1, count = 0, size = tokenListTable.size();
> List tokenList;
> short index = 0;
> while (count < size) {
>   if (isStartExist(s)) {
> tokenList = tokenListTable.get(s);
> for (SegToken st : tokenList) {
>   st.index = index;
>   result.add(st);
>   index++;
> }
> count++;
>   }
>   s++;
> }
> return result;
>   }
> here 'short index = 0;' should be 'int index = 0;'. And that's reported here 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
> author XiaoPingGao have already fixed this 
> bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7082 - Failure

2011-04-13 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7082/

No tests ran.

Build Log (for compile errors):
[...truncated 118 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2467) Custom analyzer load exceptions are not logged.

2011-04-13 Thread Alexander Kistanov (JIRA)

Custom analyzer load exceptions are not logged.
---

 Key: SOLR-2467
 URL: https://issues.apache.org/jira/browse/SOLR-2467
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
Reporter: Alexander Kistanov
Priority: Minor


If any exception occurred on custom analyzer load the following catch code is 
working:

{code:title=solr/src/java/org/apache/solr/schema/IndexSchema.java}
  } catch (Exception e) {
throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
  "Cannot load analyzer: "+analyzerName );
  }
{code}

Analyzer load exception "e" is not logged at all.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019652#comment-13019652
 ] 

Koji Sekiguchi commented on SOLR-2436:
--

The patch looks good, Tommaso!

If it is going to commit, it breaks back-compat. I think we need a note for 
users in CHANGES.txt.

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019649#comment-13019649
 ] 

Koji Sekiguchi commented on SOLR-2436:
--

Hi Uwe,

The problematic snippet regarding XInclude handling has been first introduced 
in my patch that I borrowed from DIH. When I did it, I missed something. Thank 
you for the alarm.

Now we are trying to embed the config in update processor instead of loading it 
from out of solrconfig.xml, the problematic snippet are gone.

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect

2011-04-13 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019637#comment-13019637
 ] 

Robert Muir commented on LUCENE-3022:
-

This sounds like a bug, do you want to try your hand at contributing a patch?

See http://wiki.apache.org/lucene-java/HowToContribute for some instructions.


> DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
> -
>
> Key: LUCENE-3022
> URL: https://issues.apache.org/jira/browse/LUCENE-3022
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/analyzers
>Affects Versions: 2.9.4, 3.1
>Reporter: Johann Höchtl
>Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> When using the DictionaryCompoundWordTokenFilter with a german dictionary, I 
> got a strange behaviour:
> The german word "streifenbluse" (blouse with stripes) was decompounded to 
> "streifen" (stripe),"reifen"(tire) which makes no sense at all.
> I thought the flag onlyLongestMatch would fix this, because "streifen" is 
> longer than "reifen", but it had no effect.
> So I reviewed the sourcecode and found the problem:
> [code]
> protected void decomposeInternal(final Token token) {
> // Only words longer than minWordSize get processed
> if (token.length() < this.minWordSize) {
>   return;
> }
> 
> char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer());
> 
> for (int i=0;i Token longestMatchToken=null;
> for (int j=this.minSubwordSize-1;j if(i+j>token.length()) {
> break;
> }
> if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
> if (this.onlyLongestMatch) {
>if (longestMatchToken!=null) {
>  if (longestMatchToken.length()longestMatchToken=createToken(i,j,token);
>  }
>} else {
>  longestMatchToken=createToken(i,j,token);
>}
> } else {
>tokens.add(createToken(i,j,token));
> }
> } 
> }
> if (this.onlyLongestMatch && longestMatchToken!=null) {
>   tokens.add(longestMatchToken);
> }
> }
>   }
> [/code]
> should be changed to 
> [code]
> protected void decomposeInternal(final Token token) {
> // Only words longer than minWordSize get processed
> if (token.termLength() < this.minWordSize) {
>   return;
> }
> char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer());
> Token longestMatchToken=null;
> for (int i=0;i for (int j=this.minSubwordSize-1;j if(i+j>token.termLength()) {
> break;
> }
> if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
> if (this.onlyLongestMatch) {
>if (longestMatchToken!=null) {
>  if (longestMatchToken.termLength()longestMatchToken=createToken(i,j,token);
>  }
>} else {
>  longestMatchToken=createToken(i,j,token);
>}
> } else {
>tokens.add(createToken(i,j,token));
> }
> }
> }
> }
> if (this.onlyLongestMatch && longestMatchToken!=null) {
> tokens.add(longestMatchToken);
> }
>   }
> [/code]
> So, that only the longest token is really indexed and the onlyLongestMatch 
> Flag makes sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767

2011-04-13 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019636#comment-13019636
 ] 

Robert Muir commented on LUCENE-3026:
-

This sounds like a bug, do you want to try your hand at contributing a patch?

See http://wiki.apache.org/lucene-java/HowToContribute for some instructions.


> smartcn analyzer throw NullPointer exception when the length of analysed text 
> over 32767
> 
>
> Key: LUCENE-3026
> URL: https://issues.apache.org/jira/browse/LUCENE-3026
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/analyzers
>Affects Versions: 3.1, 4.0
>Reporter: wangzhenghang
>
> That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
> makeIndex() method:
>   public List makeIndex() {
> List result = new ArrayList();
> int s = -1, count = 0, size = tokenListTable.size();
> List tokenList;
> short index = 0;
> while (count < size) {
>   if (isStartExist(s)) {
> tokenList = tokenListTable.get(s);
> for (SegToken st : tokenList) {
>   st.index = index;
>   result.add(st);
>   index++;
> }
> count++;
>   }
>   s++;
> }
> return result;
>   }
> here 'short index = 0;' should be 'int index = 0;'. And that's reported here 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
> author XiaoPingGao have already fixed this 
> bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767

2011-04-13 Thread wangzhenghang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangzhenghang updated LUCENE-3026:
--

Summary: smartcn analyzer throw NullPointer exception when the length of 
analysed text over 32767  (was: smartcn analysis throw NullPointer exception 
when the length of analysed text over 32767)

> smartcn analyzer throw NullPointer exception when the length of analysed text 
> over 32767
> 
>
> Key: LUCENE-3026
> URL: https://issues.apache.org/jira/browse/LUCENE-3026
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/analyzers
>Affects Versions: 3.1, 4.0
>Reporter: wangzhenghang
>
> That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
> makeIndex() method:
>   public List makeIndex() {
> List result = new ArrayList();
> int s = -1, count = 0, size = tokenListTable.size();
> List tokenList;
> short index = 0;
> while (count < size) {
>   if (isStartExist(s)) {
> tokenList = tokenListTable.get(s);
> for (SegToken st : tokenList) {
>   st.index = index;
>   result.add(st);
>   index++;
> }
> count++;
>   }
>   s++;
> }
> return result;
>   }
> here 'short index = 0;' should be 'int index = 0;'. And that's reported here 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
> author XiaoPingGao have already fixed this 
> bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2011-04-13 Thread David Christianson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019600#comment-13019600
 ] 

David Christianson commented on SOLR-1782:
--

Is a fix anticipated? Already in the next version which we should just 
download? Other issues?

We are currently running 1.4.1 (955469) and are seeing this problem for some 
applications. The patch given does not apply out of the box to 1.4.1 (955469) 
without a few tweaks but so far appears to work.

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
> Attachments: SOLR-1782.2.patch, SOLR-1782.patch, 
> SOLR-1782.test.patch, index.rar
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7075 - Failure

2011-04-13 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7075/

1 tests failed.
REGRESSION:  org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589)
at java.lang.StringBuffer.append(StringBuffer.java:337)
at 
java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617)
at 
org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93)
at 
org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304)
at 
org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1082)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1010)




Build Log (for compile errors):
[...truncated 5276 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7081 - Failure

2011-04-13 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7081/

1 tests failed.
REGRESSION:  org.apache.solr.handler.TestReplicationHandler.testBackup

Error Message:
Backup success not detected:  
0123.34 
KB/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/test-results/temp/2/org.apache.solr.handler.TestReplicationHandler$SolrInstance-1302724843266/master/data/index130272484362716_9.tis2.fnx_9.nrm_9.fnm_9.fdt_9.tii_9.fdx_9.frqsegments_gtruefalse130272484362716schema-replication2.xml:schema.xmlcommittrue130272484362716This 
response format is experimental.  It is likely to change in the future. 
 

Stack Trace:
junit.framework.AssertionFailedError: Backup success not detected:

0123.34 
KB/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/test-results/temp/2/org.apache.solr.handler.TestReplicationHandler$SolrInstance-1302724843266/master/data/index130272484362716_9.tis2.fnx_9.nrm_9.fnm_9.fdt_9.tii_9.fdx_9.frqsegments_gtruefalse130272484362716schema-replication2.xml:schema.xmlcommittrue130272484362716This 
response format is experimental.  It is likely to change in the future.


at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)

0123.34 
KB/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/test-results/temp/2/org.apache.solr.handler.TestReplicationHandler$SolrInstance-1302724843266/master/data/index130272484362716_9.tis2.fnx_9.nrm_9.fnm_9.fdt_9.tii_9.fdx_9.frqsegments_gtruefalse130272484362716schema-replication2.xml:schema.xmlcommittrue130272484362716This 
response format is experimental.  It is likely to change in the future.


at 
org.apache.solr.handler.TestReplicationHandler.testBackup(TestReplicationHandler.java:683)




Build Log (for compile errors):
[...truncated 8741 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: An IDF variation with penalty for very rare terms

2011-04-13 Thread Marvin Humphrey

On Wed, Apr 13, 2011 at 01:01:09AM +0400, Earwin Burrfoot wrote:
> Excuse me for somewhat of an offtopic, but have anybody ever seen/used -subj- 
> ?
> Something that looks like like http://dl.dropbox.com/u/920413/IDFplusplus.png
> Traditional log(N/x) tail, but when nearing zero freq, instead of
> going to +inf you do a nice round bump (with controlled
> height/location/sharpness) and drop down to -inf (or zero).

I haven't used that technique, nor can I quote academic literature blessing
it.  Nevertheless, what you're doing makes sense makes sense to me.

> Rationale is that - most good, discriminating terms are found in at
> least a certain percentage of your documents, but there are lots of
> mostly unique crapterms, which at some collection sizes stop being
> strictly unique and with IDF's help explode your scores.

So you've designed a heuristic that allows you to filter a certain kind of
noise.  It sounds a lot like how people tune length normalization to adapt to
their document collections.  Many tuning techniques are corpus-specific.
Whatever works, works!

Marvin Humphrey

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019492#comment-13019492
 ] 

Uwe Schindler commented on SOLR-2436:
-

bq. Or perhaps we need a utility method and pointer to that?

That's a little bit out of scope. I was thinking about that, too, but there are 
too many different ways to parse XML (Stax, DocumentBuilder), so I did not do 
this. The new Resolver and Logging classes already handle those interface 
implementations for all types of parsing, just initializing the parsers should 
be in the caller's responsibility. Also XSL is different, so all of them use 
ResourceLoader, but their code is different.

It is just important that you just *not* use the standard copy-paste from XML 
howtos. A similar code example like above can be shown for Stax and 
Transformers/Templates. I see no need to make helper classes.

I will add to wiki later.

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: PayloadProcessorProvider Usage

2011-04-13 Thread Michael McCandless

Hmm... on option 1, how would you run into merges of target segments?
I think we currently do one big merge of the source segments, into one
segment in the target index?

But, the issue on option 2 is truly annoying.  We have the same
problem for apps that want to "ugprade" their index from 3.x to the
4.0 format (for example).

Maybe we need a new (expert) method... remergeIndex?
mergeAllSegments?  rebuildIndex?

Mike

http://blog.mikemccandless.com

On Wed, Apr 13, 2011 at 1:43 PM, Shai Erera  wrote:
> Hey,
>
> In Lucene 3.1 we've introduced PayloadProcessorProvider which allows you to
> rewrite payloads of terms during merge. The main scenario is when you merge
> indexes, and you want to rewrite/remap payloads of the incoming indexes, but
> one can certainly use it to rewrite the payloads of a term, in a given
> index.
> When we worked on it, we thought of two ways the user can rewrite payloads
> when he merges indexes:
>
> 1) Set PPP on the target IW, call addIndexes(IndexReader), while PPP will be
> applied on the incoming directories only.
> 2) Set PPP on the source IW, call IW.optimize(), then use
> targetIW.addIndexes(Directory).
>
> The latter is better since in both cases the incoming segments are rewritten
> anyway, however in the first case you might run into merging segments of the
> target index as well, something you might want to avoid (that was the
> purpose of optimizing addIndexes(Directory)).
>
> But it turns out the latter is not so easy to achieve. If the source index
> has only 1 segment (at least in my case, ~100% of the time), then calling
> optimize() doesn't do anything because the MP thinks the index is already
> optimized and returns no MergeSpec. To overcome this, I wrote a
> ForceOptimizeMP which extends LogMP and forces optimize even if there is
> only one segment.
>
> Another option is to set the noCFSRation to 1.0 and flip the useCompoundFile
> flag (ie if source is compound, create no compound and vice versa). That can
> work too, but I don't think it's very good, because the source index will be
> changed from compound to non (or vice versa), which is something that the
> app didn't want.
>
> So I think option 1 is better, but I wanted to ask if someone knows of a
> better way to achieve this?
>
> Shai

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019476#comment-13019476
 ] 

Mark Miller commented on SOLR-2436:
---

Or perhaps we need a utility method and pointer to that?

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019474#comment-13019474
 ] 

Mark Miller commented on SOLR-2436:
---

bq. Maybe we should add my last comment into the Wiki: 

+1

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019473#comment-13019473
 ] 

Uwe Schindler commented on SOLR-2436:
-

Maybe we should add my last comment into the Wiki: "Howto load XML from Solr's 
config resources", to prevent broken code again from appearing (if this no 
issue here anymore this is fine, I was just alarmed). I had a hard time to fix 
all XML handling in Solr (DIH is still broken with charsets), but XInclude now 
works as expected everywhere.

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019470#comment-13019470
 ] 

Uwe Schindler edited comment on SOLR-2436 at 4/13/11 6:14 PM:
--

Here is the new way to load XML from ResourceLoaders in Solr (taken from 
Config). This code also intercepts errors and warnings and logs them correctly 
(parsers tend to write them to System.err):

{code:java}
public static final Logger log = LoggerFactory.getLogger(Config.class);
private static final XMLErrorLogger xmllog = new XMLErrorLogger(log);

...

final InputSource is = new InputSource(loader.openConfig(name));
is.setSystemId(SystemIdResolver.createSystemIdFromResourceName(name));

final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
// only enable xinclude, if a SystemId is available
if (is.getSystemId() != null) {
try {
dbf.setXIncludeAware(true);
dbf.setNamespaceAware(true);
} catch(UnsupportedOperationException e) {
log.warn(name + " XML parser doesn't support XInclude option");
}
}

final DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new SystemIdResolver(loader));
db.setErrorHandler(xmllog);
try {
doc = db.parse(is);
} finally {
// some XML parsers are broken and don't close the byte stream (but 
they should according to spec)
IOUtils.closeQuietly(is.getByteStream());
}
{code}

  was (Author: thetaphi):
Here is the new way to load XML from ResourceLoaders in Solr (taken from 
Config). This code also intercepts errors and warnings and logs them correctly 
(parsers tend to write them to System.err):

{code:java}
is = new InputSource(loader.openConfig(name));
is.setSystemId(SystemIdResolver.createSystemIdFromResourceName(name));


// only enable xinclude, if a SystemId is available
if (is.getSystemId() != null) {
try {
dbf.setXIncludeAware(true);
dbf.setNamespaceAware(true);
} catch(UnsupportedOperationException e) {
log.warn(name + " XML parser doesn't support XInclude option");
}
}

final DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new SystemIdResolver(loader));
db.setErrorHandler(xmllog);
try {
doc = db.parse(is);
} finally {
// some XML parsers are broken and don't close the byte stream (but 
they should according to spec)
IOUtils.closeQuietly(is.getByteStream());
}
{code}
  
> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019470#comment-13019470
 ] 

Uwe Schindler commented on SOLR-2436:
-

Here is the new way to load XML from ResourceLoaders in Solr (taken from 
Config). This code also intercepts errors and warnings and logs them correctly 
(parsers tend to write them to System.err):

{code:java}
is = new InputSource(loader.openConfig(name));
is.setSystemId(SystemIdResolver.createSystemIdFromResourceName(name));


// only enable xinclude, if a SystemId is available
if (is.getSystemId() != null) {
try {
dbf.setXIncludeAware(true);
dbf.setNamespaceAware(true);
} catch(UnsupportedOperationException e) {
log.warn(name + " XML parser doesn't support XInclude option");
}
}

final DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new SystemIdResolver(loader));
db.setErrorHandler(xmllog);
try {
doc = db.parse(is);
} finally {
// some XML parsers are broken and don't close the byte stream (but 
they should according to spec)
IOUtils.closeQuietly(is.getByteStream());
}
{code}

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019465#comment-13019465
 ] 

Uwe Schindler commented on SOLR-2436:
-

I just looked at the patch, is the SOLR-2436_2.patch still active or replaced 
by Kojis?

I ask because:
{noformat}
+try{
+  final InputSource is = new 
InputSource(loader.openConfig(uimaConfigFile));
+  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
+  // only enable xinclude, if SystemId is present (makes no sense 
otherwise)
+  if (is.getSystemId() != null) {
+try {
+  dbf.setXIncludeAware(true);
+  dbf.setNamespaceAware(true);
+} catch( UnsupportedOperationException e ) {
+  LOG.warn( "XML parser doesn't support XInclude option" );
+}
+  }
{noformat}

This XInclude Handling is broken (the if-clause never gets executed). We have a 
new framework that makes XML-Loading from ResourceLoaders working correct, even 
with relative pathes! Just look at the example committed during the cleanup 
issue (look at other places in solr where DocumentBuilders or XMLStreamReaders 
are instantiated. The new Solr way to load such files is a special URI scheme 
that is internally used to resolve ResourceLoader resources correctly (see 
SOLR-1656).

The latest patch looks fine, it embeds the config directly, which seems much 
more consistent.

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

PayloadProcessorProvider Usage

2011-04-13 Thread Shai Erera

Hey,

In Lucene 3.1 we've introduced PayloadProcessorProvider which allows you to
rewrite payloads of terms during merge. The main scenario is when you merge
indexes, and you want to rewrite/remap payloads of the incoming indexes, but
one can certainly use it to rewrite the payloads of a term, in a given
index.

When we worked on it, we thought of two ways the user can rewrite payloads
when he merges indexes:

1) Set PPP on the target IW, call addIndexes(IndexReader), while PPP will be
applied on the incoming directories only.
2) Set PPP on the source IW, call IW.optimize(), then use
targetIW.addIndexes(Directory).

The latter is better since in both cases the incoming segments are rewritten
anyway, however in the first case you might run into merging segments of the
target index as well, something you might want to avoid (that was the
purpose of optimizing addIndexes(Directory)).

But it turns out the latter is not so easy to achieve. If the source index
has only 1 segment (at least in my case, ~100% of the time), then calling
optimize() doesn't do anything because the MP thinks the index is already
optimized and returns no MergeSpec. To overcome this, I wrote a
ForceOptimizeMP which extends LogMP and forces optimize even if there is
only one segment.

Another option is to set the noCFSRation to 1.0 and flip the useCompoundFile
flag (ie if source is compound, create no compound and vice versa). That can
work too, but I don't think it's very good, because the source index will be
changed from compound to non (or vice versa), which is something that the
app didn't want.

So I think option 1 is better, but I wanted to ask if someone knows of a
better way to achieve this?

Shai

[jira] [Commented] (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

2011-04-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019421#comment-13019421
 ] 

Mark Miller commented on LUCENE-2939:
-

Okay - I'm going to commit to trunk shortly.

> Highlighter should try and use maxDocCharsToAnalyze in 
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
> when using CachingTokenStream
> 
>
> Key: LUCENE-2939
> URL: https://issues.apache.org/jira/browse/LUCENE-2939
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 3.1.1, 3.2, 4.0
>
> Attachments: LUCENE-2939.patch, LUCENE-2939.patch, LUCENE-2939.patch, 
> LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire 
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect 
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with 
> CachingTokenStream

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-64) strict hierarchical facets

2011-04-13 Thread Relephant (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019399#comment-13019399
 ] 

Relephant edited comment on SOLR-64 at 4/13/11 4:04 PM:


Hi all, we have just tried to apply solr-64 to 3.1. Attached 
"SOLR-64_3.1.0.patch". 

Hope that helps.

  was (Author: relephant):
Hi all, we have just tried to apply solr-64 to 3.1. Attached 
"SOLR-64_3.1.0.diff". 

Hope that helps.
  
> strict hierarchical facets
> --
>
> Key: SOLR-64
> URL: https://issues.apache.org/jira/browse/SOLR-64
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
>Assignee: Koji Sekiguchi
> Fix For: 4.0
>
> Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, 
> SOLR-64.patch, SOLR-64_3.1.0.patch
>
>
> Strict Facet Hierarchies... each tag has at most one parent (a tree).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-64) strict hierarchical facets

2011-04-13 Thread Relephant (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Relephant updated SOLR-64:
--

Attachment: SOLR-64_3.1.0.patch

> strict hierarchical facets
> --
>
> Key: SOLR-64
> URL: https://issues.apache.org/jira/browse/SOLR-64
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
>Assignee: Koji Sekiguchi
> Fix For: 4.0
>
> Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, 
> SOLR-64.patch, SOLR-64_3.1.0.patch
>
>
> Strict Facet Hierarchies... each tag has at most one parent (a tree).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-64) strict hierarchical facets

2011-04-13 Thread Relephant (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Relephant updated SOLR-64:
--

Attachment: (was: SOLR-64_3.1.0.diff)

> strict hierarchical facets
> --
>
> Key: SOLR-64
> URL: https://issues.apache.org/jira/browse/SOLR-64
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
>Assignee: Koji Sekiguchi
> Fix For: 4.0
>
> Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, 
> SOLR-64.patch, SOLR-64_3.1.0.patch
>
>
> Strict Facet Hierarchies... each tag has at most one parent (a tree).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-64) strict hierarchical facets

2011-04-13 Thread Relephant (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Relephant updated SOLR-64:
--

Attachment: SOLR-64_3.1.0.diff

Hi all, we have just tried to apply solr-64 to 3.1. Attached 
"SOLR-64_3.1.0.diff". 

Hope that helps.

> strict hierarchical facets
> --
>
> Key: SOLR-64
> URL: https://issues.apache.org/jira/browse/SOLR-64
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
>Assignee: Koji Sekiguchi
> Fix For: 4.0
>
> Attachments: SOLR-64.patch, SOLR-64.patch, SOLR-64.patch, 
> SOLR-64.patch, SOLR-64_3.1.0.diff
>
>
> Strict Facet Hierarchies... each tag has at most one parent (a tree).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Numerical ids for terms?

2011-04-13 Thread Gregor Heinrich


Thanks Toke and Kirill -- I guess that's the way to go (at least until v4.0).

Best regards

gregor

On 4/13/11 3:42 PM, Toke Eskildsen wrote:

On Tue, 2011-04-12 at 11:41 +0200, Gregor Heinrich wrote:

Hi -- has there been any effort to create a numerical representation of Lucene
indices. That is, to use the Lucene Directory backend as a large term-document
matrix at index level. As this would require bijective mapping between terms
(per-field, as customary in Lucene) and a numerical index (integer, monotonous
from 0 to numTerms()-1), I guess this requires some some special modifications
to the Lucene core.

Maybe you're thinking about something like TermsEnum?
https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/index/TermsEnum.html
It provides ordinal-access to terms, represented with longs. In order to
make the access at index-level rather than segment-level you will have
to perform a merge of the ordinals from the different segments.

Unfortunately it is optional whether the codec supports ordinal-based
terms access and the default codec does not, so you will have to
explicitly select a codec when you build your index.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-04-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019391#comment-13019391
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

In the current patch, I'm copying the parallel array for the end of a term's 
postings per reader [re]open.  However in the case where we're opening a reader 
after each document is indexed, this is wasteful.  We can simply queue the term 
ids from the last indexed document, and only copy the newly updated values over 
to the 'read' only consistent parallel array.

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3026) smartcn analysis throw NullPointer exception when the length of analysed text over 32767

2011-04-13 Thread wangzhenghang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangzhenghang updated LUCENE-3026:
--

Description: 
That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
makeIndex() method:
  public List makeIndex() {
List result = new ArrayList();
int s = -1, count = 0, size = tokenListTable.size();
List tokenList;
short index = 0;
while (count < size) {
  if (isStartExist(s)) {
tokenList = tokenListTable.get(s);
for (SegToken st : tokenList) {
  st.index = index;
  result.add(st);
  index++;
}
count++;
  }
  s++;
}
return result;
  }

here 'short index = 0;' should be 'int index = 0;'. And that's reported here 
http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and 
http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
author XiaoPingGao have already fixed this 
bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

  was:
That all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
makeIndex() method:
  public List makeIndex() {
List result = new ArrayList();
int s = -1, count = 0, size = tokenListTable.size();
List tokenList;
short index = 0;
while (count < size) {
  if (isStartExist(s)) {
tokenList = tokenListTable.get(s);
for (SegToken st : tokenList) {
  st.index = index;
  result.add(st);
  index++;
}
count++;
  }
  s++;
}
return result;
  }

'short index = 0;' should be 'int index = 0;'. And that's reported here 
http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2, 
http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
author XiaoPingGao have already fixed this 
bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java


> smartcn analysis throw NullPointer exception when the length of analysed text 
> over 32767
> 
>
> Key: LUCENE-3026
> URL: https://issues.apache.org/jira/browse/LUCENE-3026
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/analyzers
>Affects Versions: 3.1, 4.0
>Reporter: wangzhenghang
>
> That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
> makeIndex() method:
>   public List makeIndex() {
> List result = new ArrayList();
> int s = -1, count = 0, size = tokenListTable.size();
> List tokenList;
> short index = 0;
> while (count < size) {
>   if (isStartExist(s)) {
> tokenList = tokenListTable.get(s);
> for (SegToken st : tokenList) {
>   st.index = index;
>   result.add(st);
>   index++;
> }
> count++;
>   }
>   s++;
> }
> return result;
>   }
> here 'short index = 0;' should be 'int index = 0;'. And that's reported here 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and 
> http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
> author XiaoPingGao have already fixed this 
> bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3026) smartcn analysis throw NullPointer exception when the length of analysed text over 32767

2011-04-13 Thread wangzhenghang (JIRA)

smartcn analysis throw NullPointer exception when the length of analysed text 
over 32767


 Key: LUCENE-3026
 URL: https://issues.apache.org/jira/browse/LUCENE-3026
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1, 4.0
Reporter: wangzhenghang


That all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
makeIndex() method:
  public List makeIndex() {
List result = new ArrayList();
int s = -1, count = 0, size = tokenListTable.size();
List tokenList;
short index = 0;
while (count < size) {
  if (isStartExist(s)) {
tokenList = tokenListTable.get(s);
for (SegToken st : tokenList) {
  st.index = index;
  result.add(st);
  index++;
}
count++;
  }
  s++;
}
return result;
  }

'short index = 0;' should be 'int index = 0;'. And that's reported here 
http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2, 
http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
author XiaoPingGao have already fixed this 
bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

GSoC: LUCENE-2308: Separately specify a field's type

2011-04-13 Thread Nikola Tanković

Hi all,

if everything goes well I'll be delighted to be part of this project this
summer together with my assigned mentor Mike. My task will be to introduce
new classes to Lucene core which will enable to separate Fields' Lucene
properties from it's value (
https://issues.apache.org/jira/browse/LUCENE-2308).

As you assume, this will largely impact lucene & solr, so we need to think
this through thoroughly.

Changes will include:

   - Introduction of an FieldType class that will hold all the extra
   properties now stored inside Field instance other than field value itself.
   - New FieldTypeAttribute interface will be added to handle extension with
   new field properties inspired by IndexWriterConfig.
   - Refactoring and dividing of settings for term frequency and positioning
   can also be done
(LUCENE-2048
   )
   - Discuss possible effects of completion of
LUCENE-2310on this
project
   - Adequate Factory class for easier configuration of new Field instances
   together with manually added new FieldTypeAttributes
   - FieldType, once instantiated is read-only. Only fields value can be
   changed.
   - Simple hierarchy of Field classes with core properties logically
   predefaulted. E.g.:
  - NumberField,
  - StringField,
  - TextField,
  - NonIndexedField,


My questions and issues:

   - Backward compatibility? Will this go to Lucene 3.0?
   - What is the best way to break this into small baby steps?


Kindly,
Nikola Tanković

Re: Patch for http_proxy support in solr-ruby client

2011-04-13 Thread Duncan Robertson

Thanks Erik,

I hadn't seen RSolr and it looks like it fixes all the problems was having.
Maybe rather than keeping many solutions, I'll just take a look at this one.

Duncan

On 13/04/2011 14:51, "Erik Hatcher"  wrote:

> Duncan -
> 
> I'm the original creator of solr-ruby and put it under Solr's svn.  But many
> folks are now using RSolr, and even in our own (JRuby-based product) we use
> simply Net::HTTP and not a library like solr-ruby or RSolr.
> 
> I don't personally have incentive to continue to maintain solr-ruby, so maybe
> your fork is now "official"?   Though the git craze has made me feel weary
> because so many official versions are simply someone's personal fork.
> 
> We can pull solr-ruby from Solr's svn eventually, as something else more
> official takes its place.
> 
> Erik
> 
> 
> 
> On Apr 13, 2011, at 04:13 , Duncan Robertson wrote:
> 
>> Hi Otis,
>> 
>> The fork you're talking is mine! But the repos I forked is not official, so
>> I am trying to find out where the official version is so I can patch it.
>> 
>> D
>> 
>> 
>> On 13/04/2011 04:45, "Otis Gospodnetic"  wrote:
>> 
>>> Hi,
>>> 
>>> Hm, maybe you are asking where solr-ruby actually lives and is being
>>> developed?
>>> I'm not sure.  I see it under solr/client/ruby/solr-ruby (no new development
>>> in 
>>> ages?), but I also see an *active* solr-ruby fork over on
>>> https://github.com/bbcrd/solr-ruby .  So if you want to contribute to
>>> solr-ruby 
>>> on Github, get yourself a Github account, fork that solr-ruby, make your
>>> change, 
>>> and submit it via the pull request.  This is separate from Solr @ Apache.
>>> 
>>> Otis
>>> 
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>> 
>>> 
>>> 
>>> - Original Message 
 From: Duncan Robertson 
 To: dev@lucene.apache.org
 Sent: Tue, April 12, 2011 4:36:17 AM
 Subject: Patch for http_proxy support in solr-ruby client

 Hi,

 I have a patch for adding http_proxy support to the solr-ruby client.  I
 thought the project was managed via Github, but this turns out not to be
 the
 case. It the process the same as for Solr itself?

 https://github.com/bbcrd/solr-ruby/compare/5b06e66f4e%5E...a76aee983e

 Best,
 Duncan

 http://www.bbc.co.uk/
 This  e-mail (and any attachments) is confidential and may contain personal
 views  which are not the views of the BBC unless specifically stated.
 If you have  received it in error, please delete it from your system.
 Do not use, copy or  disclose the information in any way nor act in
 reliance
 on 
 it and notify the  sender immediately.
 Please note that the BBC monitors e-mails sent or  received.
 Further communication will signify your consent to  this.

 -
 To  unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For  additional commands, e-mail: dev-h...@lucene.apache.org

>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>> 
>> 
>> http://www.bbc.co.uk/
>> This e-mail (and any attachments) is confidential and may contain personal
>> views which are not the views of the BBC unless specifically stated.
>> If you have received it in error, please delete it from your system.
>> Do not use, copy or disclose the information in any way nor act in reliance
>> on it and notify the sender immediately.
>> Please note that the BBC monitors e-mails sent or received.
>> Further communication will signify your consent to this.
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Tommaso Teofili (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated SOLR-2436:
--

Attachment: SOLR-2436-3.patch

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-04-13 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019377#comment-13019377
 ] 

Tommaso Teofili commented on SOLR-2436:
---

Hello Koji,
your patch seems fine to me from the functional point of view.

Just, I don't think the SolrUIMAConfigurationReader should be emptied, I 
wouldn't remove it preferring to assign to it the simple responsibility of 
reading args without the previous explicit Node traversing but, as you did, 
using the "Solr way".
I also made some fixes to remove warning while getting objects from the 
NamedList.

> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, 
> SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

need help in constructing a query

2011-04-13 Thread Ramamurthy, Premila

Need help in constructing a solr query,

I need the values for a field. I want values which does not have "embedded 
space"
The value of the indexed field should not have embedded space.


Please help.

Thanks
Premila

[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7062 - Failure

2011-04-13 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7062/

1 tests failed.
REGRESSION:  org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589)
at java.lang.StringBuffer.append(StringBuffer.java:337)
at 
java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617)
at 
org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93)
at 
org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304)
at 
org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1076)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1008)




Build Log (for compile errors):
[...truncated 9243 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-13 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019370#comment-13019370
 ] 

Jason Rutherglen commented on LUCENE-2956:
--

Simon, nice work.  I agree with Michael B. that the deletes are super complex.  
We had discussed using sequence ids for all segments (not just the RT enabled 
DWPT ones) however we never worked out a specification, eg, for things like 
wrap around if a primitive short[] was used.

Shall we start again on LUCENE-2312?  I think we still need/want to use 
sequence ids there.  The RT DWPTs shouldn't have so many documents that using a 
long[] for the sequence ids is too RAM consuming?

> Support updateDocument() with DWPTs
> ---
>
> Key: LUCENE-2956
> URL: https://issues.apache.org/jira/browse/LUCENE-2956
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2956.patch, LUCENE-2956.patch
>
>
> With separate DocumentsWriterPerThreads (DWPT) it can currently happen that 
> the delete part of an updateDocument() is flushed and committed separately 
> from the corresponding new document.
> We need to make sure that updateDocument() is always an atomic operation from 
> a IW.commit() and IW.getReader() perspective.  See LUCENE-2324 for more 
> details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: TestIndexWriterDelete#testUpdatesOnDiskFull can false fail

2011-04-13 Thread Michael McCandless

+1

Mike

http://blog.mikemccandless.com

On Wed, Apr 13, 2011 at 5:58 AM, Simon Willnauer
 wrote:
> In TestIndexWriterDelete#testUpdatesOnDiskFull especially between line
> 538 and 553 we could get a random exception from the
> MockDirectoryWrapper which makes the test fail since we are not
> catching / expecting those exceptions.
> I can make this fail  on trunk even in 1000 runs but on realtime it
> fails quickly after I merged this morning. I think we should just
> disable the random exception for this part and reenable after we are
> done, see patch below! - Thoughts?
>
>
> Index: lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
> ===
> --- lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java  
> (revision
> 1091721)
> +++ lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java  
> (working
> copy)
> @@ -536,7 +536,9 @@
>             fail(testName + " hit IOException after disk space was freed up");
>           }
>         }
> -
> +        // prevent throwing a random exception here!!
> +        final double randomIOExceptionRate = dir.getRandomIOExceptionRate();
> +        dir.setRandomIOExceptionRate(0.0);
>         if (!success) {
>           // Must force the close else the writer can have
>           // open files which cause exc in MockRAMDir.close
> @@ -549,6 +551,7 @@
>           _TestUtil.checkIndex(dir);
>           TestIndexWriter.assertNoUnreferencedFiles(dir, "after 
> writer.close");
>         }
> +        dir.setRandomIOExceptionRate(randomIOExceptionRate);
>
>         // Finally, verify index is not corrupt, and, if
>         // we succeeded, we see all docs changed, and if
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [HUDSON] Lucene-trunk - Build # 1528 - Still Failing

2011-04-13 Thread Michael McCandless

"GC overhead limit exceeded"...

Mike

http://blog.mikemccandless.com

On Tue, Apr 12, 2011 at 10:43 PM, Apache Hudson Server
 wrote:
> Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1528/
>
> 1 tests failed.
> REGRESSION:  org.apache.lucene.index.TestNRTThreads.testNRTThreads
>
> Error Message:
> Some threads threw uncaught exceptions!
>
> Stack Trace:
> junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
>        at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
>        at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
>        at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521)
>
>
>
>
> Build Log (for compile errors):
> [...truncated 11900 lines...]
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-04-13 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019356#comment-13019356
 ] 

Varun Thacker commented on LUCENE-3018:
---

I made the mistake of adding the ant-contrib jar and trying to compile it. This 
requires cpptasks which is not part of ant-contrib 
Link to the cpptasks jar : 
http://sourceforge.net/projects/ant-contrib/files/ant-contrib/cpptasks-1.0-beta4/

Adding this jar , I was able to compile the code. 


> Lucene Native Directory implementation need automated build
> ---
>
> Key: LUCENE-3018
> URL: https://issues.apache.org/jira/browse/LUCENE-3018
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Build
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Varun Thacker
>Priority: Minor
> Fix For: 4.0
>
>
> Currently the native directory impl in contrib/misc require manual action to 
> compile the c code (partially) documented in 
>  
> https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
> yet it would be nice if we had an ant task and documentation for all 
> platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-04-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019329#comment-13019329
 ] 

Uwe Schindler commented on LUCENE-3018:
---

Hi,
I suggest to use ANT Contrib for compiling the C Parts. That includes fine in 
our build infrastructure and supplies ANT tasks for compiling and linking: 
[http://ant-contrib.sourceforge.net/cpptasks/index.html]

I think your example pastebin is using this. We only need to add the JAR in the 
lib folder of Lucene, so ANT can load the plugin.

> Lucene Native Directory implementation need automated build
> ---
>
> Key: LUCENE-3018
> URL: https://issues.apache.org/jira/browse/LUCENE-3018
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Build
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Varun Thacker
>Priority: Minor
> Fix For: 4.0
>
>
> Currently the native directory impl in contrib/misc require manual action to 
> compile the c code (partially) documented in 
>  
> https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
> yet it would be nice if we had an ant task and documentation for all 
> platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Patch for http_proxy support in solr-ruby client

2011-04-13 Thread Erik Hatcher

Duncan -

I'm the original creator of solr-ruby and put it under Solr's svn.  But many 
folks are now using RSolr, and even in our own (JRuby-based product) we use 
simply Net::HTTP and not a library like solr-ruby or RSolr.  

I don't personally have incentive to continue to maintain solr-ruby, so maybe 
your fork is now "official"?   Though the git craze has made me feel weary 
because so many official versions are simply someone's personal fork.

We can pull solr-ruby from Solr's svn eventually, as something else more 
official takes its place.

Erik



On Apr 13, 2011, at 04:13 , Duncan Robertson wrote:

> Hi Otis,
> 
> The fork you're talking is mine! But the repos I forked is not official, so
> I am trying to find out where the official version is so I can patch it.
> 
> D
> 
> 
> On 13/04/2011 04:45, "Otis Gospodnetic"  wrote:
> 
>> Hi,
>> 
>> Hm, maybe you are asking where solr-ruby actually lives and is being
>> developed?
>> I'm not sure.  I see it under solr/client/ruby/solr-ruby (no new development
>> in 
>> ages?), but I also see an *active* solr-ruby fork over on
>> https://github.com/bbcrd/solr-ruby .  So if you want to contribute to
>> solr-ruby 
>> on Github, get yourself a Github account, fork that solr-ruby, make your
>> change, 
>> and submit it via the pull request.  This is separate from Solr @ Apache.
>> 
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>> 
>> 
>> 
>> - Original Message 
>>> From: Duncan Robertson 
>>> To: dev@lucene.apache.org
>>> Sent: Tue, April 12, 2011 4:36:17 AM
>>> Subject: Patch for http_proxy support in solr-ruby client
>>> 
>>> Hi,
>>> 
>>> I have a patch for adding http_proxy support to the solr-ruby client.  I
>>> thought the project was managed via Github, but this turns out not to be  
>>> the
>>> case. It the process the same as for Solr itself?
>>> 
>>> https://github.com/bbcrd/solr-ruby/compare/5b06e66f4e%5E...a76aee983e
>>> 
>>> Best,
>>> Duncan
>>> 
>>> 
>>> http://www.bbc.co.uk/
>>> This  e-mail (and any attachments) is confidential and may contain personal
>>> views  which are not the views of the BBC unless specifically stated.
>>> If you have  received it in error, please delete it from your system.
>>> Do not use, copy or  disclose the information in any way nor act in reliance
>>> on 
>>> it and notify the  sender immediately.
>>> Please note that the BBC monitors e-mails sent or  received.
>>> Further communication will signify your consent to  this.
>>> 
>>> 
>>> -
>>> To  unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For  additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal 
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance 
> on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>   
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-04-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019326#comment-13019326
 ] 

Simon Willnauer commented on LUCENE-3018:
-

varun,

pastbin links are not ideal for work on issues here. you can post small 
snippets directly here or upload a patch so we can review.
nevertheless, the example you have added to pastebin seems just like a generic 
example can you try to integrate it into the trunk/lucene/conrib/misk/build.xml 
file and make it compile the NativePosixUtil.cpp? If you have that you can 
create a patch with svn diff > LUCENE-3018.patch and upload it. if you need 3rd 
party libs like ant contrib you can upload them here too.

simon

> Lucene Native Directory implementation need automated build
> ---
>
> Key: LUCENE-3018
> URL: https://issues.apache.org/jira/browse/LUCENE-3018
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Build
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Varun Thacker
>Priority: Minor
> Fix For: 4.0
>
>
> Currently the native directory impl in contrib/misc require manual action to 
> compile the c code (partially) documented in 
>  
> https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
> yet it would be nice if we had an ant task and documentation for all 
> platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Numerical ids for terms?

2011-04-13 Thread Toke Eskildsen

On Tue, 2011-04-12 at 11:41 +0200, Gregor Heinrich wrote:
> Hi -- has there been any effort to create a numerical representation of 
> Lucene 
> indices. That is, to use the Lucene Directory backend as a large 
> term-document 
> matrix at index level. As this would require bijective mapping between terms 
> (per-field, as customary in Lucene) and a numerical index (integer, 
> monotonous 
> from 0 to numTerms()-1), I guess this requires some some special 
> modifications 
> to the Lucene core.

Maybe you're thinking about something like TermsEnum?
https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/index/TermsEnum.html
It provides ordinal-access to terms, represented with longs. In order to
make the access at index-level rather than segment-level you will have
to perform a merge of the ordinals from the different segments.

Unfortunately it is optional whether the codec supports ordinal-based
terms access and the default codec does not, so you will have to
explicitly select a codec when you build your index.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019298#comment-13019298
 ] 

Simon Willnauer commented on LUCENE-2956:
-

I committed that patch and merged with trunk

> Support updateDocument() with DWPTs
> ---
>
> Key: LUCENE-2956
> URL: https://issues.apache.org/jira/browse/LUCENE-2956
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2956.patch, LUCENE-2956.patch
>
>
> With separate DocumentsWriterPerThreads (DWPT) it can currently happen that 
> the delete part of an updateDocument() is flushed and committed separately 
> from the corresponding new document.
> We need to make sure that updateDocument() is always an atomic operation from 
> a IW.commit() and IW.getReader() perspective.  See LUCENE-2324 for more 
> details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

TestIndexWriterDelete#testUpdatesOnDiskFull can false fail

2011-04-13 Thread Simon Willnauer

In TestIndexWriterDelete#testUpdatesOnDiskFull especially between line
538 and 553 we could get a random exception from the
MockDirectoryWrapper which makes the test fail since we are not
catching / expecting those exceptions.
I can make this fail  on trunk even in 1000 runs but on realtime it
fails quickly after I merged this morning. I think we should just
disable the random exception for this part and reenable after we are
done, see patch below! - Thoughts?


Index: lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
===
--- lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java  
(revision
1091721)
+++ lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java  (working
copy)
@@ -536,7 +536,9 @@
 fail(testName + " hit IOException after disk space was freed up");
   }
 }
-
+// prevent throwing a random exception here!!
+final double randomIOExceptionRate = dir.getRandomIOExceptionRate();
+dir.setRandomIOExceptionRate(0.0);
 if (!success) {
   // Must force the close else the writer can have
   // open files which cause exc in MockRAMDir.close
@@ -549,6 +551,7 @@
   _TestUtil.checkIndex(dir);
   TestIndexWriter.assertNoUnreferencedFiles(dir, "after writer.close");
 }
+dir.setRandomIOExceptionRate(randomIOExceptionRate);

 // Finally, verify index is not corrupt, and, if
 // we succeeded, we see all docs changed, and if

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7061 - Failure

2011-04-13 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7061/

14 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-110: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/1/test8006296579247039339tmp/_e_2.doc
 (Too many open files in system)

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-110:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/1/test8006296579247039339tmp/_e_2.doc
 (Too many open files in system)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/1/test8006296579247039339tmp/_e_2.doc
 (Too many open files in system)
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:833)


REGRESSION:  
org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeDocCount0

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/test8275723700845306539tmp/_0_0.tiv
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/test8275723700845306539tmp/_0_0.tiv
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:448)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:312)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:348)
at 
org.apache.lucene.index.codecs.VariableGapTermsIndexWriter.(VariableGapTermsIndexWriter.java:161)
at 
org.apache.lucene.index.codecs.standard.StandardCodec.fieldsConsumer(StandardCodec.java:58)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.(PerFieldCodecWrapper.java:64)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:54)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:78)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:103)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:65)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:55)
at 
org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2497)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2462)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1180)
at 
org.apache.lucene.index.TestIndexWriterMergePolicy.addDoc(TestIndexWriterMergePolicy.java:221)
at 
org.apache.lucene.index.TestIndexWriterMergePolicy.testMergeDocCount0(TestIndexWriterMergePolicy.java:189)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)


REGRESSION:  
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull

Error Message:
addIndexes(Directory[]) + optimize() hit IOException after disk space was freed 
up

Stack Trace:
junit.framework.AssertionFailedError: addIndexes(Directory[]) + optimize() hit 
IOException after disk space was freed up
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
at 
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:327)


REGRESSION:  org.apache.lucene.index.TestLongPostings.testLongPostings

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/longpostings.6978566692871504462/_14_0.tib
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/longpostings.6978566692871504462/_14_0.tib
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:448)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSD

[jira] [Updated] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-13 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2956:


Attachment: LUCENE-2956.patch

here is an updated patch fixing some spellings, adds atomic updates for Term[] 
and Query[] and removes the LogMergePolicy restriction from TestRollingUpdates

> Support updateDocument() with DWPTs
> ---
>
> Key: LUCENE-2956
> URL: https://issues.apache.org/jira/browse/LUCENE-2956
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2956.patch, LUCENE-2956.patch
>
>
> With separate DocumentsWriterPerThreads (DWPT) it can currently happen that 
> the delete part of an updateDocument() is flushed and committed separately 
> from the corresponding new document.
> We need to make sure that updateDocument() is always an atomic operation from 
> a IW.commit() and IW.getReader() perspective.  See LUCENE-2324 for more 
> details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Patch for http_proxy support in solr-ruby client

2011-04-13 Thread Duncan Robertson

Hi Otis,

The fork you're talking is mine! But the repos I forked is not official, so
I am trying to find out where the official version is so I can patch it.

D


On 13/04/2011 04:45, "Otis Gospodnetic"  wrote:

> Hi,
> 
> Hm, maybe you are asking where solr-ruby actually lives and is being
> developed?
> I'm not sure.  I see it under solr/client/ruby/solr-ruby (no new development
> in 
> ages?), but I also see an *active* solr-ruby fork over on
> https://github.com/bbcrd/solr-ruby .  So if you want to contribute to
> solr-ruby 
> on Github, get yourself a Github account, fork that solr-ruby, make your
> change, 
> and submit it via the pull request.  This is separate from Solr @ Apache.
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
>> From: Duncan Robertson 
>> To: dev@lucene.apache.org
>> Sent: Tue, April 12, 2011 4:36:17 AM
>> Subject: Patch for http_proxy support in solr-ruby client
>> 
>> Hi,
>> 
>> I have a patch for adding http_proxy support to the solr-ruby client.  I
>> thought the project was managed via Github, but this turns out not to be  the
>> case. It the process the same as for Solr itself?
>> 
>> https://github.com/bbcrd/solr-ruby/compare/5b06e66f4e%5E...a76aee983e
>> 
>> Best,
>> Duncan
>> 
>> 
>> http://www.bbc.co.uk/
>> This  e-mail (and any attachments) is confidential and may contain personal
>> views  which are not the views of the BBC unless specifically stated.
>> If you have  received it in error, please delete it from your system.
>> Do not use, copy or  disclose the information in any way nor act in reliance
>> on 
>> it and notify the  sender immediately.
>> Please note that the BBC monitors e-mails sent or  received.
>> Further communication will signify your consent to  this.
>> 
>> 
>> -
>> To  unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For  additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019234#comment-13019234
 ] 

Simon Willnauer commented on LUCENE-2956:
-

bq. Though it worries me a little how complex the whole delete/update logic is 
becoming (not only the part this patch adds).
I can not more agree. Its been very complex making all the tests pass and 
figuring out all the little nifty cornercases here. A different, somewhat 
simpler approach would be great. Eventually for Searchable Ram Buffers we might 
need to switch to seq. ids anyway but I think for landing DWPT on trunk we can 
go with the current approach. 
I will update the latest patch and commit it to the branch and merge with trunk 
again. Once that is done I will setup a hudson build for RT so we give it a 
little exercise while we prepare moving to trunk.

 

> Support updateDocument() with DWPTs
> ---
>
> Key: LUCENE-2956
> URL: https://issues.apache.org/jira/browse/LUCENE-2956
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2956.patch
>
>
> With separate DocumentsWriterPerThreads (DWPT) it can currently happen that 
> the delete part of an updateDocument() is flushed and committed separately 
> from the corresponding new document.
> We need to make sure that updateDocument() is always an atomic operation from 
> a IW.commit() and IW.getReader() perspective.  See LUCENE-2324 for more 
> details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

51 matches

Mail list logo