date:20110412

[
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018743#comment-13018743
]

Simon Willnauer commented on LUCENE-2956:
-

bq. I think I have an idea, however can you explain the ticketQueue?

Sure, since with DWPT the flush process is concurrent and several DWPT could
flush at the same time we must maintain the order of the flushes before we can
apply the flushed segment and the frozen global
deletes it is buffering. The reason for this is that the global deletes mark a
certain point in time where we took a DWPT out of rotation and freeze the
global deletes.

Example: A DWPT 'A' starts flushing and freezes the global deletes, then DWPT
'B' starts flushing and freezes all deletes occurred since 'A' has started. if
'B' finishes before 'A' we need to wait until 'A' is done otherwise the deletes
frozen by 'B' are not applied to 'A' and we might miss to deletes documents in
'A'.

The Ticket queue simply ensures that we push the frozen deletes and the new
created segment in the same order as the corresponding DWPT have started
flushing. If a DWPT finishes flushing we update its Ticket and then check the
head of the queue if we can remove / publish the ticket. if so we continue
publishing until the head of the queue can not be published yet or the queue is
empty.

Support updateDocument() with DWPTs
---

Key: LUCENE-2956
URL: https://issues.apache.org/jira/browse/LUCENE-2956
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: Realtime Branch
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2956.patch

With separate DocumentsWriterPerThreads (DWPT) it can currently happen that
the delete part of an updateDocument() is flushed and committed separately
from the corresponding new document.
We need to make sure that updateDocument() is always an atomic operation from
a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more
details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: termInfosIndexDivisor typo in the Solr-UIMA config?

2011-04-12 Thread Tommaso Teofili

Hi Otis,
you're right, that can be safely changed to setTermIndexDivisor (note that
line's commented so it's just for the matter of consistency I think).
Tommaso

2011/4/11 Otis Gospodnetic otis_gospodne...@yahoo.com

 Hi,

 I was looking at term index divisor and spotted this:

 .../lucene-solr-3.1$ ffxg -i IndexDivisor
 ./solr/src/test-files/solr/conf/solrconfig-termindex.xml:int
 name=setTermIndexDivisor12/int
 ./solr/src/test-files/solr/conf/solrconfig-xinclude.xml:int
 name=setTermIndexDivisor12/int
 ./solr/contrib/uima/src/test/resources/solr-uima/conf/solrconfig.xml:  !--
 To
 set the termInfosIndexDivisor, do this: --
 ./solr/contrib/uima/src/test/resources/solr-uima/conf/solrconfig.xml:
 name=termInfosIndexDivisor12/int /indexReaderFactory   HERE
 ./solr/example/solr/conf/solrconfig.xml:  !-- By explicitly declaring the
 Factory, the termIndexDivisor can
 ./solr/example/solr/conf/solrconfig.xml:   int
 name=setTermIndexDivisor12/int


 Is that termInfosIndexDivisor a typo in there?  Should it be
 setTermIndexDivisor like in the other configs?


 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018762#comment-13018762
 ] 

Simon Willnauer commented on LUCENE-3018:
-

Varun, are you getting along here so far?

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Patch for http_proxy support in solr-ruby client

2011-04-12 Thread Duncan Robertson

Hi,

I have a patch for adding http_proxy support to the solr-ruby client. I
thought the project was managed via Github, but this turns out not to be the
case. It the process the same as for Solr itself?

https://github.com/bbcrd/solr-ruby/compare/5b06e66f4e%5E...a76aee983e

Best,
Duncan


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect

DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
-

 Key: LUCENE-3022
 URL: https://issues.apache.org/jira/browse/LUCENE-3022
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1, 2.9.4
Reporter: Johann Höchtl
Priority: Minor


When using the DictionaryCompoundWordTokenFilter with a german dictionary, I 
got a strange behaviour:
The german word streifenbluse (blouse with stripes) was decompounded to 
streifen (stripe),reifen(tire) which makes no sense at all.
I thought the flag onlyLongestMatch would fix this, because streifen is 
longer than reifen, but it had no effect.
So I reviewed the sourcecode and found the problem:
[code]
protected void decomposeInternal(final Token token) {
// Only words longer than minWordSize get processed
if (token.length()  this.minWordSize) {
  return;
}

char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer());

for (int i=0;itoken.length()-this.minSubwordSize;++i) {
Token longestMatchToken=null;
for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
if(i+jtoken.length()) {
break;
}
if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
if (this.onlyLongestMatch) {
   if (longestMatchToken!=null) {
 if (longestMatchToken.length()j) {
   longestMatchToken=createToken(i,j,token);
 }
   } else {
 longestMatchToken=createToken(i,j,token);
   }
} else {
   tokens.add(createToken(i,j,token));
}
} 
}
if (this.onlyLongestMatch  longestMatchToken!=null) {
  tokens.add(longestMatchToken);
}
}
  }
[/code]

should be changed to 

[code]
protected void decomposeInternal(final Token token) {
// Only words longer than minWordSize get processed
if (token.termLength()  this.minWordSize) {
  return;
}

char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer());

Token longestMatchToken=null;
for (int i=0;itoken.termLength()-this.minSubwordSize;++i) {

for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
if(i+jtoken.termLength()) {
break;
}
if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
if (this.onlyLongestMatch) {
   if (longestMatchToken!=null) {
 if (longestMatchToken.termLength()j) {
   longestMatchToken=createToken(i,j,token);
 }
   } else {
 longestMatchToken=createToken(i,j,token);
   }
} else {
   tokens.add(createToken(i,j,token));
}
}
}
}
if (this.onlyLongestMatch  longestMatchToken!=null) {
tokens.add(longestMatchToken);
}
  }
[/code]

So, that only the longest token is really indexed and the onlyLongestMatch Flag 
makes sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Numerical ids for terms?

2011-04-12 Thread Gregor Heinrich

Hi -- has there been any effort to create a numerical representation of Lucene 
indices. That is, to use the Lucene Directory backend as a large term-document 
matrix at index level. As this would require bijective mapping between terms 
(per-field, as customary in Lucene) and a numerical index (integer, monotonous 
from 0 to numTerms()-1), I guess this requires some some special modifications 
to the Lucene core.


Another interesting feature would be to use Lucene's Directory backend for 
storage of large dense matrices, for instance to data-mining tasks from within 
Lucene.


Any suggestions?

Best regards and thanks

gregor


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Numerical ids for terms?

2011-04-12 Thread Earwin Burrfoot

On Tue, Apr 12, 2011 at 13:41, Gregor Heinrich gre...@arbylon.net wrote:
 Hi -- has there been any effort to create a numerical representation of
 Lucene indices. That is, to use the Lucene Directory backend as a large
 term-document matrix at index level. As this would require bijective mapping
 between terms (per-field, as customary in Lucene) and a numerical index
 (integer, monotonous from 0 to numTerms()-1), I guess this requires some
 some special modifications to the Lucene core.
Lucene index already provides term - id mapping in some form.

 Another interesting feature would be to use Lucene's Directory backend for
 storage of large dense matrices, for instance to data-mining tasks from
 within Lucene.
Lucene's Directory is a dumb abstraction for random-access named
write-once byte streams.
It doesn't add /any/ value over mmap.

 Any suggestions?
*troll mode on* Use numpy/scipy? :)

-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3017) FST should differentiate between final vs non-final stop nodes


[ 
https://issues.apache.org/jira/browse/LUCENE-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018819#comment-13018819
 ] 

Michael McCandless commented on LUCENE-3017:


I hear you :)

I think Lucene's needs put pressure on the traditional FST bounds... so we 
need to stretch things a bit.

 FST should differentiate between final vs non-final stop nodes
 --

 Key: LUCENE-3017
 URL: https://issues.apache.org/jira/browse/LUCENE-3017
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3017.patch


 I'm breaking out this one improvement from LUCENE-2948...
 Currently, if a node has no outgoing edges (a stop node) the FST
 forcefully marks this as a final node, but it need not do this.  Ie,
 whether that node is final or not should be orthogonal to whether it
 has arcs leaving or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3023) Land DWPT on trunk

Land DWPT on trunk
--

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
we can proceed landing the DWPT development on trunk soon. I think one of the 
bigger issues here is to make sure that all JavaDocs for IW etc. are still 
correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3023) Land DWPT on trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3023:
---

Assignee: Simon Willnauer

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3021) randomize skipInterval in tests

2011-04-12 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3021.
-

   Resolution: Fixed
Fix Version/s: 4.0

Committed revision 1091408.

 randomize skipInterval in tests
 ---

 Key: LUCENE-3021
 URL: https://issues.apache.org/jira/browse/LUCENE-3021
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3021.patch, LUCENE-3021.patch


 we probably don't test the multi-level skipping very well, but skipInterval 
 etc is now private to the codec, so for better test coverage we should 
 parameterize it to the postings writers, and randomize it via mockrandomcodec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3024) If index has more than Integer.MAX_VALUE terms, seeking can it AIOOBE due to long/int overflow

If index has more than Integer.MAX_VALUE terms, seeking can it AIOOBE due to 
long/int overflow
--

 Key: LUCENE-3024
 URL: https://issues.apache.org/jira/browse/LUCENE-3024
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1.1, 3.2, 4.0


Tom hit a new long/int overflow case: 
http://markmail.org/thread/toyl2ujcl4suqvf3

This is a regression, in 3.1, introduced with LUCENE-2075.

Worse, our Test2BTerms failed to catch this, so I've fixed that test to show 
the failure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3025) TestIndexWriterExceptions fails on windows (2)

2011-04-12 Thread Robert Muir (JIRA)

TestIndexWriterExceptions fails on windows (2)
--

 Key: LUCENE-3025
 URL: https://issues.apache.org/jira/browse/LUCENE-3025
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


Note: this is a different problem than LUCENE-2991 (I disabled the assert for 
that problem).


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3025) TestIndexWriterExceptions fails on windows (2)

2011-04-12 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018849#comment-13018849
 ] 

Robert Muir commented on LUCENE-3025:
-

junit-sequential:
[junit] Testsuite: org.apache.lucene.index.TestIndexWriterExceptions
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.681 sec
[junit]
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterExceptions 
-Dtestmethod=testExceptionsDuringCommit -Dtests.seed=-2008642232753917842:-38943
12978511486973 -Dtests.codec=MockRandom
[junit] NOTE: test params are: codec=MockRandom, locale=sl, 
timezone=Asia/Pyongyang
[junit] NOTE: all tests run in this JVM:
[junit] [TestIndexWriterExceptions]
[junit] NOTE: Windows Vista 6.0 x86/Sun Microsystems Inc. 1.6.0_23 
(32-bit)/cpus=4,threads=1,free=14526984,total=32202752
[junit] -  ---
[junit] Testcase: 
testExceptionsDuringCommit(org.apache.lucene.index.TestIndexWriterExceptions):  
  FAILED
[junit]
[junit] junit.framework.AssertionFailedError:
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
[junit] at 
org.apache.lucene.index.TestIndexWriterExceptions.testExceptionsDuringCommit(TestIndexWriterExceptions.java:867)
[junit]
[junit]
[junit] Test org.apache.lucene.index.TestIndexWriterExceptions FAILED

 TestIndexWriterExceptions fails on windows (2)
 --

 Key: LUCENE-3025
 URL: https://issues.apache.org/jira/browse/LUCENE-3025
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir

 Note: this is a different problem than LUCENE-2991 (I disabled the assert for 
 that problem).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3025) TestIndexWriterExceptions fails on windows (2)

[
https://issues.apache.org/jira/browse/LUCENE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018864#comment-13018864
]

Simon Willnauer commented on LUCENE-3025:
-

bq. Note: this is a different problem than LUCENE-2991 (I disabled the assert
for that problem).

Robert, I think this is the same problem as LUCENE-2991 has. I actually wonder
if that problem still occurs if we close the directory first and then reopen
another one to see if the files are still there. I don't have a windows ready
to try it myself but what I would be interested in is if this situation occurs
are there still open files around? I mean if we are holding on to the files and
this test doesn't fail on unix then we have a problem with MockDirectoryWrapper
since it is supposed to 'act' like windows in that respect.

When I look at w.rollback() I see a possibility that this test can fail under
windows since it is calling closeInternal(false) that does not wait for merges.
So some of the files can still be referenced by the merger. I wonder if it
makes sense to call w.close() to make sure all files are done before checking
if the files have been removed.

TestIndexWriterExceptions fails on windows (2)
--

Key: LUCENE-3025
URL: https://issues.apache.org/jira/browse/LUCENE-3025
Project: Lucene - Java
Issue Type: Bug
Reporter: Robert Muir

Note: this is a different problem than LUCENE-2991 (I disabled the assert for
that problem).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Numerical ids for terms?

2011-04-12 Thread Gregor Heinrich

Thanks for the quick response. Please be a bit more concrete than some form of 
term--id mapping:  Do you refer to subclassing SegmentReader with the 
appropriate Map implementation or is there a tested structure in the existing 
API that I've overseen? Regarding a Directory abstraction backed by a memory 
mapping API, my question refers to using Lucene API because even if may be 
perceived dumb, it hides a lot of boilerplate code. Are there any efforts 
going on regarding this?


Cheers

gregor

On 4/12/11 1:21 PM, Earwin Burrfoot wrote:

On Tue, Apr 12, 2011 at 13:41, Gregor Heinrichgre...@arbylon.net  wrote:

Hi -- has there been any effort to create a numerical representation of
Lucene indices. That is, to use the Lucene Directory backend as a large
term-document matrix at index level. As this would require bijective mapping
between terms (per-field, as customary in Lucene) and a numerical index
(integer, monotonous from 0 to numTerms()-1), I guess this requires some
some special modifications to the Lucene core.

Lucene index already provides term-  id mapping in some form.


Another interesting feature would be to use Lucene's Directory backend for
storage of large dense matrices, for instance to data-mining tasks from
within Lucene.

Lucene's Directory is a dumb abstraction for random-access named
write-once byte streams.
It doesn't add /any/ value over mmap.


Any suggestions?

*troll mode on* Use numpy/scipy? :)



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2335) FunctionQParser can't handle fieldnames containing whitespace

2011-04-12 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2335.


   Resolution: Fixed
Fix Version/s: 4.0
 Assignee: Hoss Man

Committed revision 1091516.


 FunctionQParser can't handle fieldnames containing whitespace
 -

 Key: SOLR-2335
 URL: https://issues.apache.org/jira/browse/SOLR-2335
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4.1
Reporter: Miriam Doelle
Assignee: Hoss Man
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2335.patch, SOLR-2335.patch


 FunctionQParser has some simplistic assumptions about what types of field 
 names it will deal with, in particular it can't deal with field names 
 containing whitespaces.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-04-12 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018951#comment-13018951
 ] 

Varun Thacker commented on LUCENE-3018:
---

This is a small ant task I wrote to build a shared library from a .cpp file. 
http://pastebin.com/diTXru9w

I will update the code to add options for command line parameters which are 
required to build NativePosixUtil.cpp 

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3017) FST should differentiate between final vs non-final stop nodes


 [ 
https://issues.apache.org/jira/browse/LUCENE-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3017.


Resolution: Fixed

 FST should differentiate between final vs non-final stop nodes
 --

 Key: LUCENE-3017
 URL: https://issues.apache.org/jira/browse/LUCENE-3017
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3017.patch


 I'm breaking out this one improvement from LUCENE-2948...
 Currently, if a node has no outgoing edges (a stop node) the FST
 forcefully marks this as a final node, but it need not do this.  Ie,
 whether that node is final or not should be orthogonal to whether it
 has arcs leaving or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2466) CommonsHttpSolrServer will retry a query even if _maxRetries is 0

CommonsHttpSolrServer will retry a query even if _maxRetries is 0
-

 Key: SOLR-2466
 URL: https://issues.apache.org/jira/browse/SOLR-2466
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1, 1.4.1, 4.0
Reporter: Tomás Fernández Löbbe


The HttpClient library used by CommonsHttpSolrServer will retry by default 3 
times a request that failed on the server side, even if the _maxRetries field 
of  CommonsHttpSolrServer is set to 0.
The retry count should be managed in just one place and CommonsHttpSolrServer 
seems to be the right one. 
CommonsHttpSolrServer should override that HttpClient default to 0 retries, and 
manage the retry count with the value of the field _maxRetries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3024) If index has more than Integer.MAX_VALUE terms, seeking can it AIOOBE due to long/int overflow


[ 
https://issues.apache.org/jira/browse/LUCENE-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019003#comment-13019003
 ] 

Michael McCandless commented on LUCENE-3024:


Fixed in 3.2, 4.0.

I'm leaving this open in case we ever release 3.1.1.

 If index has more than Integer.MAX_VALUE terms, seeking can it AIOOBE due to 
 long/int overflow
 --

 Key: LUCENE-3024
 URL: https://issues.apache.org/jira/browse/LUCENE-3024
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: LUCENE-3024.patch


 Tom hit a new long/int overflow case: 
 http://markmail.org/thread/toyl2ujcl4suqvf3
 This is a regression, in 3.1, introduced with LUCENE-2075.
 Worse, our Test2BTerms failed to catch this, so I've fixed that test to show 
 the failure.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-12 Thread Michael Busch (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019019#comment-13019019
]

Michael Busch commented on LUCENE-2956:
---

Cool patch! :)
Though it worries me a little how complex the whole delete/update logic is
becoming (not only the part this patch adds).

Originally we decided to not go with sequenceIDs partly because we thought the
implementation might be too complex, but I think it'd be simpler than the
current approach that uses bits.

The sequenceIDs approach we had in the beginning was also completely lockless
in a very very simple way.

Anyway, I have yet to take a closer look here. Just something that might be
worth discussing.

Support updateDocument() with DWPTs
---

Attachments: LUCENE-2956.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2466) CommonsHttpSolrServer will retry a query even if _maxRetries is 0

2011-04-12 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019021#comment-13019021
 ] 

Yonik Seeley commented on SOLR-2466:


Hmmm, that's interesting.  Does anyone know why we (CommonsHttpSolrServer) do 
retries when HttpClient already does them?  Is there an advantage to doing it 
ourselves?

 CommonsHttpSolrServer will retry a query even if _maxRetries is 0
 -

 Key: SOLR-2466
 URL: https://issues.apache.org/jira/browse/SOLR-2466
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: Tomás Fernández Löbbe

 The HttpClient library used by CommonsHttpSolrServer will retry by default 3 
 times a request that failed on the server side, even if the _maxRetries field 
 of  CommonsHttpSolrServer is set to 0.
 The retry count should be managed in just one place and CommonsHttpSolrServer 
 seems to be the right one. 
 CommonsHttpSolrServer should override that HttpClient default to 0 retries, 
 and manage the retry count with the value of the field _maxRetries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2466) CommonsHttpSolrServer will retry a query even if _maxRetries is 0


[ 
https://issues.apache.org/jira/browse/SOLR-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019028#comment-13019028
 ] 

Tomás Fernández Löbbe commented on SOLR-2466:
-

Not sure why Solr does it on CommonsHttpSolrServer. I do think is important to 
be able to specify the exact number of retries. 

 CommonsHttpSolrServer will retry a query even if _maxRetries is 0
 -

 Key: SOLR-2466
 URL: https://issues.apache.org/jira/browse/SOLR-2466
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: Tomás Fernández Löbbe

 The HttpClient library used by CommonsHttpSolrServer will retry by default 3 
 times a request that failed on the server side, even if the _maxRetries field 
 of  CommonsHttpSolrServer is set to 0.
 The retry count should be managed in just one place and CommonsHttpSolrServer 
 seems to be the right one. 
 CommonsHttpSolrServer should override that HttpClient default to 0 retries, 
 and manage the retry count with the value of the field _maxRetries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2466) CommonsHttpSolrServer will retry a query even if _maxRetries is 0

2011-04-12 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019030#comment-13019030
 ] 

Hoss Man commented on SOLR-2466:


I haven't checked hte code but if i remember correctly (from another project) 
HttpClient and it's RetryHandler hook are only used when dealing with 
*network* failures -- ie: connection refused, connection timeout, connection 
aborted.  If a request is a success at the TCP layer, but a failure at the HTTP 
layer (ie: 500) then you need your own retry logic external to the HttpClient

that may be what SolrJ is doing, to account for transient errors (ie: trying to 
add during a blocking commit or something like that)

 CommonsHttpSolrServer will retry a query even if _maxRetries is 0
 -

 Key: SOLR-2466
 URL: https://issues.apache.org/jira/browse/SOLR-2466
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4.1, 3.1, 4.0
Reporter: Tomás Fernández Löbbe

 The HttpClient library used by CommonsHttpSolrServer will retry by default 3 
 times a request that failed on the server side, even if the _maxRetries field 
 of  CommonsHttpSolrServer is set to 0.
 The retry count should be managed in just one place and CommonsHttpSolrServer 
 seems to be the right one. 
 CommonsHttpSolrServer should override that HttpClient default to 0 retries, 
 and manage the retry count with the value of the field _maxRetries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

An IDF variation with penalty for very rare terms

2011-04-12 Thread Earwin Burrfoot

Excuse me for somewhat of an offtopic, but have anybody ever seen/used -subj- ?
Something that looks like like http://dl.dropbox.com/u/920413/IDFplusplus.png
Traditional log(N/x) tail, but when nearing zero freq, instead of
going to +inf you do a nice round bump (with controlled
height/location/sharpness) and drop down to -inf (or zero).

Should be cool when doing cosine-measure(or something
comparable)-based document comparisons (eg. in a more like this
query, to mention Lucene at least once :) ), over dirty data.
Rationale is that - most good, discriminating terms are found in at
least a certain percentage of your documents, but there are lots of
mostly unique crapterms, which at some collection sizes stop being
strictly unique and with IDF's help explode your scores.

-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7045 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7045/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: 
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:227)




Build Log (for compile errors):
[...truncated 8822 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1566) Allow components to add fields to outgoing documents


 [ 
https://issues.apache.org/jira/browse/SOLR-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1566:
--

Attachment: SOLR-1566-PageTool.patch

Also encountered the Velocity bug. Fixed it by patching PageTool (attached).

 Allow components to add fields to outgoing documents
 

 Key: SOLR-1566
 URL: https://issues.apache.org/jira/browse/SOLR-1566
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1566-DocTransformer.patch, 
 SOLR-1566-DocTransformer.patch, SOLR-1566-DocTransformer.patch, 
 SOLR-1566-DocTransformer.patch, SOLR-1566-DocTransformer.patch, 
 SOLR-1566-DocTransformer.patch, SOLR-1566-PageTool.patch, 
 SOLR-1566-gsi.patch, SOLR-1566-rm.patch, SOLR-1566-rm.patch, 
 SOLR-1566-rm.patch, SOLR-1566-rm.patch, SOLR-1566-rm.patch, SOLR-1566.patch, 
 SOLR-1566.patch, SOLR-1566.patch, SOLR-1566.patch, SOLR-1566_parsing.patch


 Currently it is not possible for components to add fields to outgoing 
 documents which are not in the the stored fields of the document.  This makes 
 it cumbersome to add computed fields/metadata .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7051 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7051/

9 tests failed.
REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.cloud.ZkController.init(ZkController.java:280)
at org.apache.solr.cloud.ZkController.init(ZkController.java:133)
at 
org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:164)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:242)
at 
org.apache.solr.cloud.CloudStateUpdateTest.setUp(CloudStateUpdateTest.java:122)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /live_nodes
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
at 
org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:224)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:354)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:290)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:255)
at org.apache.solr.cloud.ZkController.init(ZkController.java:275)


REGRESSION:  org.apache.solr.core.TestXIncludeConfig.testXInclude

Error Message:
org.apache.solr.common.cloud.ZooKeeperException: 

Stack Trace:
java.lang.RuntimeException: org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.util.TestHarness.init(TestHarness.java:153)
at org.apache.solr.util.TestHarness.init(TestHarness.java:135)
at org.apache.solr.util.TestHarness.init(TestHarness.java:125)
at 
org.apache.solr.util.AbstractSolrTestCase.setUp(AbstractSolrTestCase.java:132)
at 
org.apache.solr.core.TestXIncludeConfig.setUp(TestXIncludeConfig.java:55)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
at 
org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:183)
at 
org.apache.solr.util.TestHarness$Initializer$1.init(TestHarness.java:184)
at 
org.apache.solr.util.TestHarness$Initializer.initialize(TestHarness.java:179)
at org.apache.solr.util.TestHarness.init(TestHarness.java:140)
Caused by: java.util.concurrent.TimeoutException: Could not connect to 
ZooKeeper 127.0.0.1:25804/solr within 5000 ms
at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:121)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:69)
at org.apache.solr.cloud.ZkController.init(ZkController.java:104)
at 
org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:164)


FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.component.QueryElevationComponentTest

Error Message:
org.apache.solr.common.cloud.ZooKeeperException: 

Stack Trace:
java.lang.RuntimeException: org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.util.TestHarness.init(TestHarness.java:153)
at org.apache.solr.util.TestHarness.init(TestHarness.java:135)
at org.apache.solr.util.TestHarness.init(TestHarness.java:125)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:238)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:101)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:89)
at 
org.apache.solr.handler.component.QueryElevationComponentTest.beforeClass(QueryElevationComponentTest.java:48)
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
at 
org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:183)
at 
org.apache.solr.util.TestHarness$Initializer$1.init(TestHarness.java:184)
at 
org.apache.solr.util.TestHarness$Initializer.initialize(TestHarness.java:179)
at org.apache.solr.util.TestHarness.init(TestHarness.java:140)
Caused by: java.util.concurrent.TimeoutException: Could not connect to 
ZooKeeper 127.0.0.1:25804/solr within 5000 ms
at

[HUDSON] Lucene-trunk - Build # 1528 - Still Failing

Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1528/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521)




Build Log (for compile errors):
[...truncated 11900 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7043 - Failure