date:20110117

On Sun, Jan 16, 2011 at 10:09 PM, Erick Erickson
erickerick...@gmail.com wrote:
 I'm working on a patch for SOLR-445, and it's near completion. The problem
 is that I'm getting weird test failures. TestDistributedSearch fails *only*
 when run as part of the full ant test, *not* when I run it either from the
 command line (-Dtestcase=) or from within IntelliJ.
 So I assume it's some interesting interaction between some previous test
 and the one in question. Before I go and try to figure it out, does anyone
 have any wisdom to offer as to
 1 how to go about tracking it down?

when the test fails, you should see something like this (assuming TestG):
[junit] NOTE: all tests run in this JVM:
[junit] [TestA, TestB, TestC, TestD, TestE, TestF, TestG]

So then hack your build.xml file, remove the junit definition for
testpackage and replace it with
batchtest fork=yes todir=${junit.output.dir} if=testpackage
  fileset dir=src/test includes=**/TestA* **/TestB* **/TestC*
**/TestD* **/TestE* **/TestF* **/TestG*/
/batchtest

now you can run just this group in a single thread with
-Dtestpackage=1 -Dtests.threadspercpu=1
as long as the test fails, basically binary search the list to find
the offending test, by editing testpackage
e.g. you should be able to reduce it to D,E,F,G, then D,E,G, then D,G
to find out it was D that was the problem interfering with G.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982594#action_12982594
 ] 

Robert Muir commented on LUCENE-2295:
-

Hi Shai, that sounds like the right solution to me!


 Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the 
 same functionality as MaxFieldLength provided on IndexWriter
 ---

 Key: LUCENE-2295
 URL: https://issues.apache.org/jira/browse/LUCENE-2295
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Shai Erera
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch


 A spinoff from LUCENE-2294. Instead of asking the user to specify on 
 IndexWriter his requested MFL limit, we can get rid of this setting entirely 
 by providing an Analyzer which will wrap any other Analyzer and its 
 TokenStream with a TokenFilter that keeps track of the number of tokens 
 produced and stop when the limit has reached.
 This will remove any count tracking in IW's indexing, which is done even if I 
 specified UNLIMITED for MFL.
 Let's try to do it for 3.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Release schedule Lucene 4?

2011-01-17 Thread Michael McCandless

On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 But: they don't yet support updating the values (the goal is to allow
 this, eventually).  This is just the first step.

 No?  Hmm... I thought that was a main part of the functionality?

Patches welcome ;)

Seriously, how would you do it?  IE, I don't like how norms handle it
today -- on changing a single value we must write the full array (for
all docs).  Same problem w/ del docs, though since its 1 bit per doc
the cost is far less.

Better would be a stacked approach, where the orig full array remains
and we write sparse deltas (pairs of docID + new value), and at init
we load the base and apply all the diffs (in order).  Merging would
periodically coalesce them down again...

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs


[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982597#action_12982597
 ] 

Michael McCandless commented on LUCENE-2666:


OK thanks.  Hopefully we can catch this under infoStream's watch.

Not calling prepareCommit is harmless -- IW simply calls it for you under the 
hood when commit() is called, if you hadn't already called prepareCommit().

The two APIs are separate in case you want to involve Lucene in a 2 phased 
commit w/ other resources.

 ArrayIndexOutOfBoundsException when iterating over TermDocs
 ---

 Key: LUCENE-2666
 URL: https://issues.apache.org/jira/browse/LUCENE-2666
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.2
Reporter: Shay Banon

 A user got this very strange exception, and I managed to get the index that 
 it happens on. Basically, iterating over the TermDocs causes an AAOIB 
 exception. I easily reproduced it using the FieldCache which does exactly 
 that (the field in question is indexed as numeric). Here is the exception:
 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
   at 
 org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
   at 
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
   at 
 org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
   at TestMe.main(TestMe.java:56)
 It happens on the following segment: _26t docCount: 914 delCount: 1 
 delFileName: _26t_1.del
 And as you can see, it smells like a corner case (it fails for document 
 number 912, the AIOOB happens from the deleted docs). The code to recreate it 
 is simple:
 FSDirectory dir = FSDirectory.open(new File(index));
 IndexReader reader = IndexReader.open(dir, true);
 IndexReader[] subReaders = reader.getSequentialSubReaders();
 for (IndexReader subReader : subReaders) {
 Field field = 
 subReader.getClass().getSuperclass().getDeclaredField(si);
 field.setAccessible(true);
 SegmentInfo si = (SegmentInfo) field.get(subReader);
 System.out.println(--  + si);
 if (si.getDocStoreSegment().contains(_26t)) {
 // this is the probleatic one...
 System.out.println(problematic one...);
 FieldCache.DEFAULT.getLongs(subReader, __documentdate, 
 FieldCache.NUMERIC_UTILS_LONG_PARSER);
 }
 }
 Here is the result of a check index on that segment:
   8 of 10: name=_26t docCount=914
 compound=true
 hasProx=true
 numFiles=2
 size (MB)=1.641
 diagnostics = {optimize=false, mergeFactor=10, 
 os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
 lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
 os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
 has deletions [delFileName=_26t_1.del]
 test: open reader.OK [1 deleted docs]
 test: fields..OK [32 fields]
 test: field norms.OK [32 fields]
 test: terms, freq, prox...ERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
   at 
 org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
   at TestMe.main(TestMe.java:47)
 test: stored fields...ERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
   at 
 org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
   at TestMe.main(TestMe.java:47)
 test: term vectorsERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
   at 
 org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
   at

[jira] Resolved: (LUCENE-2768) add infrastructure for longer running nightly test cases


 [ 
https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2768.


   Resolution: Fixed
Fix Version/s: 3.1

I think this is fixed.

 add infrastructure for longer running nightly test cases
 

 Key: LUCENE-2768
 URL: https://issues.apache.org/jira/browse/LUCENE-2768
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: europarl.lines.txt.gz, europarl.py, LUCENE-2768.patch, 
 LUCENE-2768.patch, LUCENE-2768.patch, LUCENE-2768_nightly.patch, 
 LUCENE-2768_nightly.patch


 I'm spinning this out of LUCENE-2762...
 The patch there adds initial infrastructure for tests to pull documents from 
 a line file, and adds a longish running test case using that line file to 
 test NRT.
 I'd like to see some tests run on more substantial indices based on real 
 data... so this is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

[
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982604#action_12982604
]

Michael McCandless commented on LUCENE-2474:

Ahh, I get it -- invoking the listeners (on cache evict) is dangerous to do
under a global lock since they could conceivably be costly.

I had switched to Set to try to prevent silliness in the event that an app adds
same listener over over (w/o removing it), and also to not have O(N^2) cost
when removing listeners. I mean, it is an expert API, but I still think we
should attempt to be defensive against silliness?

How about CHM? (There is not builtin CHS, right? And HS just wraps an HM
anyway).

Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean
custom caches that use the IndexReader (getFieldCacheKey)

Key: LUCENE-2474
URL: https://issues.apache.org/jira/browse/LUCENE-2474
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Reporter: Shay Banon
Attachments: LUCENE-2474.patch, LUCENE-2474.patch

Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean
custom caches that use the IndexReader (getFieldCacheKey).
A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its
make a lot of sense to cache things based on IndexReader#getFieldCacheKey,
even Lucene itself uses it, for example, with the CachingWrapperFilter.
FieldCache enjoys being called explicitly to purge its cache when possible
(which is tricky to know from the outside, especially when using NRT -
reader attack of the clones).
The provided patch allows to plug a CacheEvictionListener which will be
called when the cache should be purged for an IndexReader.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2738) improve test coverage for omitNorms and omitTFAP


[ 
https://issues.apache.org/jira/browse/LUCENE-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982607#action_12982607
 ] 

Robert Muir commented on LUCENE-2738:
-

Mike just reminded me about this one:
My concern for not committing is that we would actually reduce test coverage,
because most tests will create say field foobar in a loop like this:
{noformat}
for () {
   newField(foobar);
}
{noformat}

So because removing norms/omitTFAP is infectious, i think we will end out
only testing certain cases... unless we change the patch so that this random 
value
is remembered per field name during the length of the test... i think thats 
the
right solution (adding hashmap)

 improve test coverage for omitNorms and omitTFAP
 

 Key: LUCENE-2738
 URL: https://issues.apache.org/jira/browse/LUCENE-2738
 Project: Lucene - Java
  Issue Type: Test
  Components: Build
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2738.patch, LUCENE-2738.patch, LUCENE-2738.patch


 just expands on what lucenetestcase does already...
 if you say Analyzed_NO_NORMS, we might set norms anyway.
 in the same sense, if you say Index.NO, we might index it anyway, and might 
 set omitTFAP etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

[
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir resolved SOLR-1677.
---

Resolution: Fixed

I think this issue has been resolved for some time.

Add support for o.a.lucene.util.Version for BaseTokenizerFactory and
BaseTokenFilterFactory
---

Key: SOLR-1677
URL: https://issues.apache.org/jira/browse/SOLR-1677
Project: Solr
Issue Type: Sub-task
Components: Schema and Analysis
Reporter: Uwe Schindler
Fix For: 3.1, 4.0

Attachments: SOLR-1677-lucenetrunk-branch-2.patch,
SOLR-1677-lucenetrunk-branch-3.patch, SOLR-1677-lucenetrunk-branch.patch,
SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch

Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards
compatibility with old indexes created using older versions of Lucene. The
most important example is StandardTokenizer, which changed its behaviour with
posIncr and incorrect host token types in 2.4 and also in 2.9.
In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with
much more Unicode support, almost every Tokenizer/TokenFilter needs this
Version parameter. In 2.9, the deprecated old ctors without Version take
LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
This patch adds basic support for the Lucene Version property to the base
factories. Subclasses then can use the luceneMatchVersion decoded enum (in
3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently
contains a helper map to decode the version strings, but in 3.0 is can be
replaced by Version.valueOf(String), as the Version is a subclass of Java5
enums. The default value is Version.LUCENE_24 (as this is the default for the
no-version ctors in Lucene).
This patch also removes unneeded conversions to CharArraySet from
StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed
to match Lucene 3.0.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2169) QueryElevationComponentTest.testInterface test failure


 [ 
https://issues.apache.org/jira/browse/SOLR-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2169.
---

Resolution: Not A Problem

Marking not a problem, appears to be fixed with the solr test cleanup.

 QueryElevationComponentTest.testInterface test failure
 --

 Key: SOLR-2169
 URL: https://issues.apache.org/jira/browse/SOLR-2169
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 3.1, 4.0
 Environment: Hudson
Reporter: Robert Muir
 Fix For: 3.1, 4.0


 Stacktrace:
 {noformat}
 [junit] Testsuite: 
 org.apache.solr.handler.component.QueryElevationComponentTest
 [junit] Testcase: 
 testInterface(org.apache.solr.handler.component.QueryElevationComponentTest): 
   Caused an ERROR
 [junit] Exception during query
 [junit] java.lang.RuntimeException: Exception during query
 [junit]   at 
 org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:343)
 [junit]   at 
 org.apache.solr.handler.component.QueryElevationComponentTest.testInterface(QueryElevationComponentTest.java:100)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:873)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:840)
 [junit] Caused by: java.lang.RuntimeException: REQUEST FAILED: 
 xpath=//*[@numFound='0']
 [junit]   xml response was: ?xml version=1.0 encoding=UTF-8?
 [junit] response
 [junit] lst name=responseHeaderint name=status0/intint 
 name=QTime3/intlst name=paramsstr name=q.alt*:*/strstr 
 name=qt/elevate/strstr name=defTypedismax/str/lst/lstresult 
 name=response numFound=6 start=0docstr name=ida/strarr 
 name=str_sstra/str/arrstr name=titleipod/str/docdocstr 
 name=idb/strarr name=str_sstrb/str/arrstr name=titleipod 
 ipod/str/docdocstr name=idc/strarr 
 name=str_sstrc/str/arrstr name=titleipod ipod 
 ipod/str/docdocstr name=idx/strarr 
 name=str_sstrx/str/arrstr name=titleboosted/str/docdocstr 
 name=idy/strarr name=str_sstry/str/arrstr 
 name=titleboosted boosted/str/docdocstr name=idz/strarr 
 name=str_sstrz/str/arrstr name=titleboosted boosted 
 boosted/str/doc/result
 [junit] /response
 [junit] 
 [junit]   request was:q.alt=*:*qt=/elevatedefType=dismax
 [junit]   at 
 org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:336)
 [junit] 
 [junit] 
 [junit] Tests run: 3, Failures: 0, Errors: 1, Time elapsed: 0.581 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] NOTE: reproduce with: ant test 
 -Dtestcase=QueryElevationComponentTest -Dtestmethod=testInterface 
 -Dtests.seed=8921358208309552689:278255616409435903
 [junit] NOTE: test params are: codec=MockSep, locale=fr, 
 timezone=America/Indiana/Vevay
 [junit] -  ---
 [junit] - Standard Error -
 [junit] 17 oct. 2010 04:10:28 org.apache.solr.SolrTestCaseJ4 assertQ
 [junit] GRAVE: REQUEST FAILED: xpath=//*[@numFound='0']
 [junit]   xml response was: ?xml version=1.0 encoding=UTF-8?
 [junit] response
 [junit] lst name=responseHeaderint name=status0/intint 
 name=QTime3/intlst name=paramsstr name=q.alt*:*/strstr 
 name=qt/elevate/strstr name=defTypedismax/str/lst/lstresult 
 name=response numFound=6 start=0docstr name=ida/strarr 
 name=str_sstra/str/arrstr name=titleipod/str/docdocstr 
 name=idb/strarr name=str_sstrb/str/arrstr name=titleipod 
 ipod/str/docdocstr name=idc/strarr 
 name=str_sstrc/str/arrstr name=titleipod ipod 
 ipod/str/docdocstr name=idx/strarr 
 name=str_sstrx/str/arrstr name=titleboosted/str/docdocstr 
 name=idy/strarr name=str_sstry/str/arrstr 
 name=titleboosted boosted/str/docdocstr name=idz/strarr 
 name=str_sstrz/str/arrstr name=titleboosted boosted 
 boosted/str/doc/result
 [junit] /response
 [junit] 
 [junit]   request was:q.alt=*:*qt=/elevatedefType=dismax
 [junit] 17 oct. 2010 04:10:28 org.apache.solr.common.SolrException log
 [junit] GRAVE: REQUEST FAILED: 
 q.alt=*:*qt=/elevatedefType=dismax:java.lang.RuntimeException: REQUEST 
 FAILED: xpath=//*[@numFound='0']
 [junit]   xml response was: ?xml version=1.0 encoding=UTF-8?
 [junit] response
 [junit] lst name=responseHeaderint name=status0/intint 
 name=QTime3/intlst name=paramsstr name=q.alt*:*/strstr 
 name=qt/elevate/strstr name=defTypedismax/str/lst/lstresult 
 name=response numFound=6 start=0docstr name=ida/strarr 
 name=str_sstra/str/arrstr name=titleipod/str/docdocstr 
 name=idb/strarr name=str_sstrb/str/arrstr name=titleipod 
 ipod/str/docdocstr name=idc/strarr 
 name=str_sstrc/str/arrstr name=titleipod ipod 
 ipod/str/docdocstr name=idx/strarr 
 name=str_sstrx/str/arrstr

Re: Odd test failure, looking for pointers.

2011-01-17 Thread Erick Erickson

Robert:

Thanks, I had a general idea that was the approach, but it's great to
have someone point the way in detail...

Erick

On Mon, Jan 17, 2011 at 5:48 AM, Robert Muir rcm...@gmail.com wrote:

 On Sun, Jan 16, 2011 at 10:09 PM, Erick Erickson
 erickerick...@gmail.com wrote:
  I'm working on a patch for SOLR-445, and it's near completion. The
 problem
  is that I'm getting weird test failures. TestDistributedSearch fails
 *only*
  when run as part of the full ant test, *not* when I run it either from
 the
  command line (-Dtestcase=) or from within IntelliJ.
  So I assume it's some interesting interaction between some previous
 test
  and the one in question. Before I go and try to figure it out, does
 anyone
  have any wisdom to offer as to
  1 how to go about tracking it down?

 when the test fails, you should see something like this (assuming TestG):
[junit] NOTE: all tests run in this JVM:
[junit] [TestA, TestB, TestC, TestD, TestE, TestF, TestG]

 So then hack your build.xml file, remove the junit definition for
 testpackage and replace it with
 batchtest fork=yes todir=${junit.output.dir} if=testpackage
  fileset dir=src/test includes=**/TestA* **/TestB* **/TestC*
 **/TestD* **/TestE* **/TestF* **/TestG*/
 /batchtest

 now you can run just this group in a single thread with
 -Dtestpackage=1 -Dtests.threadspercpu=1
 as long as the test fails, basically binary search the list to find
 the offending test, by editing testpackage
 e.g. you should be able to reduce it to D,E,F,G, then D,E,G, then D,G
 to find out it was D that was the problem interfering with G.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Odd test failure, looking for pointers.

On Mon, Jan 17, 2011 at 7:40 AM, Erick Erickson erickerick...@gmail.com wrote:
 Robert:
 Thanks, I had a general idea that was the approach, but it's great to
 have someone point the way in detail...
 Erick


another thing to consider, it might not be test meddling at all.
it might just be some concurrency bug, and when running the full 'ant
test' your machine is busier because of multiple JVMs going at the
same time...

so you can also try making your computer really busy (e.g. running the
lucene tests) and at the same time running the test by itself.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Release schedule Lucene 4?

2011-01-17 Thread Shai Erera

This sounds like incremental field updates :).

Shai

On Mon, Jan 17, 2011 at 1:24 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
  But: they don't yet support updating the values (the goal is to allow
  this, eventually).  This is just the first step.
 
  No?  Hmm... I thought that was a main part of the functionality?

 Patches welcome ;)

 Seriously, how would you do it?  IE, I don't like how norms handle it
 today -- on changing a single value we must write the full array (for
 all docs).  Same problem w/ del docs, though since its 1 bit per doc
 the cost is far less.

 Better would be a stacked approach, where the orig full array remains
 and we write sparse deltas (pairs of docID + new value), and at init
 we load the base and apply all the diffs (in order).  Merging would
 periodically coalesce them down again...

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2259) Improve analyzer/version handling in Solr

[
https://issues.apache.org/jira/browse/SOLR-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated SOLR-2259:
--

Attachment: SOLR-2259part4.patch

Here is the patch for the last part, part 4.

I added a warnDeprecated() helper method to the base class,
and added messages for all deprecated classes in trunk.

Improve analyzer/version handling in Solr
-

Key: SOLR-2259
URL: https://issues.apache.org/jira/browse/SOLR-2259
Project: Solr
Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0

Attachments: SOLR-2259.patch, SOLR-2259.patch, SOLR-2259_part3.patch,
SOLR-2259part2.patch, SOLR-2259part4.patch

We added Version for backwards compatibility support in Lucene.
We use this to fire deprecated code to emulate old version to ensure index
backwards compat.
Related: we deprecate old analysis components and eventually remove them.
To hook into Solr, at first it defaulted to Version 2.4 emulation everywhere,
with the example having the latest.
if you don't specify a version in your solrconfig, it defaults to 2.4 though.
However, as of LUCENE-2781 2.4 is removed: but users with old configs that
don't specify a version should not be silently upgraded to the Version 3.0
emulation... this is bad.
Additionally, when users are using deprecated emulation or using deprecated
factories they might not know it, and it might come as a surprise if they
upgrade, especially if they arent looking at java apis or java code.
I propose:
# in trunk: we make the solrconfig luceneMatchVersion mandatory.
This is simple: Uwe already has a method that will error out if its not
present, we just use that.
# in 3.x: we warn if you don't specify luceneMatchVersion in solrconfig:
telling you that its going to be required in 4.0 and that you are defaulting
to 2.4 emulation.
For example: Warning: luceneMatchVersion is not specified in solrconfig.xml.
Defaulting to 2.4 emulation. You should at some point declare and reindex to
at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed
in 4.0. This parameter will be mandatory in 4.0.
# in 3.x,trunk: we warn if you are using a deprecated matchVersion constant
somewhere in general, even for a specific tokenizer, telling you that you
need to at some point reindex with a current version before you can move to
the next release.
For example: Warning: you are using 2.4 emulation, at some point you need to
bump and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x
and will be removed in 4.0
# in 3.x,trunk: we warn if you are using a deprecated TokenStreamFactory so
that you know its going to be removed.
For example: Warning: the ISOLatin1FilterFactory is deprecated and will be
removed in the next release. You should migrate to ASCIIFoldingFilterFactory.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level

[
https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982635#action_12982635
]

Robert Muir commented on LUCENE-2236:
-

bq. Is that too bad?

well my concern about the deprecated methods is we get into the hairy backwards
compat situation...
we already had issues with this with Similarity.

It might be ok to essentially fix Similarity to be the way we want for 4.0
(break it) since its an expert API anyway.
This patch was just a quick stab...
I definitely agree with you about the name though, i prefer Similarity.

bq. should Sim be aware of for which field it was created, so that no need to
pass it as parameter in its methods in case this is ever important?

Well honestly I think what you are saying is really needed for the future...
but I would prefer to actually delay that until a future patch :)

Making an optimized TermScorer is becoming more and more complicated, see the
one in the bulkpostings branch for example. Because of this,
its extremely tricky to customize the scoring with good performance. I think
the score caching etc in term scorer needs to be moved out of TermScorer,
instead the responsibility of calculating the score should reside in
Similarity, including any caching it needs to do (which is really impl
dependent).
Basically Similarity needs to be responsible for score(), but let TermScorer
etc deal with enumerating postings etc.

For example, we now have the stats totalTermFreq/totalCollectionFreq by field
for a term, but you can't e.g. take these and make a
Language-modelling based scorer, which you should be able to do *right now*,
except for limitations in our APIs.

So in a future issue I would like to propose a patch to do just this, so that
TermScorer, for example is more general. Similarity would need to be able
to 'setup' a query (e.g. things like IDF, building score caches for the query,
whatever), and then also score an individual document.

In the flexible scoring prototype this is what we did, but we went even
further, where a Similarity is also responsible for 'setting up' a searcher,
too.
So that means, its responsible for managing norm byte[] (in that patch, you
only had a byte[] norms, if you made it in your Similarity yourself).
I think long term that approach is definitely really interesting, but I think
we can go ahead and make scoring a lot more flexible in tiny steps
like this without rewriting all of lucene in one enormous patch... and this is
safer as we can benchmark performance each step of the way.

Similarity can only be set per index, but I may want to adjust scoring
behaviour at a field level
-

Key: LUCENE-2236
URL: https://issues.apache.org/jira/browse/LUCENE-2236
Project: Lucene - Java
Issue Type: Improvement
Components: Query/Scoring
Affects Versions: 3.0
Reporter: Paul taylor
Assignee: Robert Muir
Attachments: LUCENE-2236.patch

Similarity can only be set per index, but I may want to adjust scoring
behaviour at a field level, to faciliate this could we pass make field name
available to all score methods.
Currently it is only passed to some such as lengthNorm() but not others such
as tf()

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Release schedule Lucene 4?

2011-01-17 Thread Gregor Heinrich


Hi Mike, all --

a (sorrily slow) thanks for this response ;)

From the ensuing discussion, it sounds like there's a LOT to be in v4, and not 
raising wrong expectation by giving dates is appreciated ;)


Only thing is, are we talking any time in 2012 or 2011, just to have a 
coarse-grained estimate without any assumptions attached?


Best

gregor





On 1/15/11 3:20 PM, Michael McCandless wrote:

This is unfortunately hard to say!

There's tons of good stuff in 4.0, so we'd really like to release
sooner rather than later.

But then there's also alot of work remaining, eg we have 3 feature
branches in flight right now, that we need to wrap up and land on
trunk:

   * realtime (gives us concurrent flushing during indexing)

   * docvalues (adds column-stride fields)

   * bulkpostings (gives good search speedup for intblock codecs)

Plus many open Jira issues.  So it's hard to predict when all of this
will be done

Mike

On Fri, Jan 14, 2011 at 12:31 PM, Gregor Heinrichgre...@arbylon.net  wrote:

Dear Lucene team,

I am wondering whether there is an updated Lucene release schedule for the
v4.0 stream.

Any earliest/latest alpha/beta/stable date? And if not yet, where to track
such info?

Thanks in advance from Germany

gregor

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Release schedule Lucene 4?

2011-01-17 Thread Simon Willnauer

On Mon, Jan 17, 2011 at 12:24 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 But: they don't yet support updating the values (the goal is to allow
 this, eventually).  This is just the first step.

 No?  Hmm... I thought that was a main part of the functionality?

 Patches welcome ;)

 Seriously, how would you do it?  IE, I don't like how norms handle it
 today -- on changing a single value we must write the full array (for
 all docs).  Same problem w/ del docs, though since its 1 bit per doc
 the cost is far less.

For some implemenations writing the value directly would be possible
though. For instance for StraightFixedBytes and maybe DerefFixedBytes
(depending on how its indexed) we could do change the value without
writing the entire array. Yet, this would violate the write once
policy! Having this feature in Lucene and having them updateable are
babystep vs. dream - jason, again Patches welcome but let us first
land it on trunk.

simon

 Better would be a stacked approach, where the orig full array remains
 and we write sparse deltas (pairs of docID + new value), and at init
 we load the base and apply all the diffs (in order).  Merging would
 periodically coalesce them down again...

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2287) (SolrCloud) Allow users to query by multiple collections

2011-01-17 Thread Alex Cowell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Cowell updated SOLR-2287:
--

Attachment: SOLR-2287.patch

Added a test class which tests basic functionality for 3 collections but should 
be expanded upon.

 (SolrCloud) Allow users to query by multiple collections
 

 Key: SOLR-2287
 URL: https://issues.apache.org/jira/browse/SOLR-2287
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Soheb Mahmood
Priority: Minor
 Attachments: SOLR-2287.patch, SOLR-2287.patch


 This code fixes the todo items mentioned on the SolrCloud wiki:
 -optionally allow user to query by collection
 -optionally allow user to query by multiple collections (assume schemas are 
 compatible)
 We are going to put a patch to see if anyone has any trouble with this code 
 and/or if there is any comments on how to improve this code.
 Unfortunately, as of now, we don't have a test class as we are working on it. 
 We are sorry about this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2011-01-17 Thread Salman Akram (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982651#action_12982651
 ] 

Salman Akram commented on SOLR-1604:


I am trying to use CommonGrams with this patch but doesn't seem to work. 

If I don't add {!complexphrase} it uses CommonGramsQueryFilterFactory and 
proper bi-grams are made but of course doesn't use this patch.

If I add {!complexphrase} it simply does it the old way i.e. ignore CommonGrams.

Can you please help how can I combine both these features?



 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: Next

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Release schedule Lucene 4?

2011-01-17 Thread Jason Rutherglen

 Seriously, how would you do it?

Ah, for LUCENE-2312 we don't need to update existing values, we only
need to make additions, ie, it's not the general use case.  I got the
impression that DocValues should be used instead of CSF?  Does CSF
replace the FieldCache usage entirely?

 Better would be a stacked approach, where the orig full array remains
 and we write sparse deltas (pairs of docID + new value)

What is the lookup cost using this method?

On Mon, Jan 17, 2011 at 3:24 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 But: they don't yet support updating the values (the goal is to allow
 this, eventually).  This is just the first step.

 No?  Hmm... I thought that was a main part of the functionality?

 Patches welcome ;)

 Seriously, how would you do it?  IE, I don't like how norms handle it
 today -- on changing a single value we must write the full array (for
 all docs).  Same problem w/ del docs, though since its 1 bit per doc
 the cost is far less.

 Better would be a stacked approach, where the orig full array remains
 and we write sparse deltas (pairs of docID + new value), and at init
 we load the base and apply all the diffs (in order).  Merging would
 periodically coalesce them down again...

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (SOLR-1191) NullPointerException in delta import


[ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982216#action_12982216
 ] 

Gunnlaugur Thor Briem edited comment on SOLR-1191 at 1/17/11 10:23 AM:
---

I added a patch to resolve this. It resolves deltaQuery columns against pk when 
they differ by prefix (and report error more helpfully when no column matches, 
or more than one column matches).

No unit test, sorry (but there's not much deltaQuery coverage anyway). All 
existing unit tests pass, and this is working fine for me in production.

  was (Author: gthb):
Patch to resolve this. It resolves deltaQuery columns against pk when they 
differ by prefix (and report error more helpfully when no column matches, or 
more than one column matches).

No unit test, sorry (but there's not much deltaQuery coverage anyway). All 
existing unit tests pass, and this is working fine for me in production.
  
 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for

[jira] Updated: (SOLR-1191) NullPointerException in delta import


 [ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunnlaugur Thor Briem updated SOLR-1191:


Comment: was deleted

(was: Neglected to mention: that patch is against branch_3x.)

 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: user
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987716 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 7
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: user rows obtained : 46
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: user rows obtained : 0
 05/27 11:59:29 86987873 INFO  Thread-4162

[jira] Issue Comment Edited: (SOLR-1191) NullPointerException in delta import


[ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982216#action_12982216
 ] 

Gunnlaugur Thor Briem edited comment on SOLR-1191 at 1/17/11 10:24 AM:
---

I added a patch against branch_3x to resolve this. It resolves deltaQuery 
columns against pk when they differ by prefix (and report error more helpfully 
when no column matches, or more than one column matches).

No unit test, sorry (but there's not much deltaQuery coverage anyway). All 
existing unit tests pass, and this is working fine for me in production.

  was (Author: gthb):
I added a patch to resolve this. It resolves deltaQuery columns against pk 
when they differ by prefix (and report error more helpfully when no column 
matches, or more than one column matches).

No unit test, sorry (but there's not much deltaQuery coverage anyway). All 
existing unit tests pass, and this is working fine for me in production.
  
 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running

[jira] Issue Comment Edited: (SOLR-1191) NullPointerException in delta import


[ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982216#action_12982216
 ] 

Gunnlaugur Thor Briem edited comment on SOLR-1191 at 1/17/11 10:25 AM:
---

I added a patch against branch_3x to resolve this. It resolves deltaQuery 
columns against pk when they differ by prefix (and reports the error more 
helpfully when no column matches, or more than one column matches).

No unit test, sorry (but there's not much deltaQuery coverage anyway). All 
existing unit tests pass, and this is working fine for me in production.

  was (Author: gthb):
I added a patch against branch_3x to resolve this. It resolves deltaQuery 
columns against pk when they differ by prefix (and report error more helpfully 
when no column matches, or more than one column matches).

No unit test, sorry (but there's not much deltaQuery coverage anyway). All 
existing unit tests pass, and this is working fine for me in production.
  
 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162

Highlighting overlapping tokens

2011-01-17 Thread Pierre GOSSE

Hi all,

I'm having an issue when highlighting fields that have overlapping tokens. 
There was a bug opened in Jira some year ago 
https://issues.apache.org/jira/browse/LUCENE-627 but I'm a bit confused about 
this. In jira bug's status is resolved, but still I got the exact same 
problem with a genuine lucene 2.9.3.

Looking for what was going on, I checked 
org.apache.lucene.search.highlight.TokenSources that rebuilds a tokenStream 
from TermVectors and I found that token where not sorted by offset, as one 
would expect.

When sorting tokens, the following comparer is used :

public int compare(Object o1, Object o2)
{
Token t1=(Token) o1;
Token t2=(Token) o2;
if(t1.startOffset()t2.endOffset())
return 1;
if(t1.startOffset()t2.startOffset())
return -1;
return 0;
}

I'm not sure why endOffset is used instead of startOffset in first test (looks 
like a typo), and with non-overlapping token this works just fine. 

But with overlapping tokens longest token get pushed to the end of their 
overlapping zone : (big,3,6), (fish,7,11), ({big fish},3,11) would end up 
sorted in this exact order, where I would have expected (big,3,6) ({big 
fish},3,11) (fish,7,11) or ({big fish},3,11) (big,3,6) (fish,7,11).
Highligthing with the term {big fish} builds a fragment by concatenating 
big, {big fish}, and fish, giving this phrase : bigembig fish/em 
fish.

I tested a quick fix by having preceding comparer changed like this :

public int compare(Object o1, Object o2)
{
Token t1 = (Token)o1;
Token t2 = (Token)o2;
if (t1.startOffset()  t2.startOffset())
return 1;
if (t1.startOffset()  t2.startOffset())
return -1;
if (t1.endOffset()  t2.endOffset())
return -1;
if (t1.endOffset()  t2.endOffset())
return 1;
return 0;
}

Highlight behavior is now correct as far as I tested it. 

Maybe the original sorting order has a purpose I don't understand, but to me 
this slight modification seams to fix everything. What should I do ? (I'm very 
new to this list and this community). 

If someone with better understanding of lucene highlight could give me some 
feedback, I would be grateful.

Thanks for your time.

Pierre


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Query parser contract changes?

2011-01-17 Thread karl.wright

Hi folks,

I'm sorely puzzled by the fact that my QParser implementation ceased to work 
after the latest Solr/Lucene trunk update.  My previous update was about ten 
days ago, right after Mike made his index changes.

The symptom is that, although the query parser is correctly called, and seems 
to have the right arguments, the Query it is returning seems to be ignored.  I 
always get zero results.  I eliminated any possibility of error by just 
hardwiring the return of a TermQuery, and that too always yields zero results.

I was able to confirm, using the standard handler with the default query 
parser, that the index is in fine shape.  So I was wondering if the contract 
for QParser had changed in some subtle way that I missed?

Karl

Re: Release schedule Lucene 4?

2011-01-17 Thread Jason Rutherglen

 Better would be a stacked approach, where the orig full array remains
 and we write sparse deltas (pairs of docID + new value), and at init
 we load the base and apply all the diffs (in order).  Merging would
 periodically coalesce them down again...

I think this approach would be great for the DF in RT.   It's better
than a multidimensional array?  As the lookup cost won't be too high,
and we can instantiate a new main int[] every N.   I'll enumerate the
options we've gone over in the LUCENE-2312 issue, so we don't forget!

On Mon, Jan 17, 2011 at 3:24 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 But: they don't yet support updating the values (the goal is to allow
 this, eventually).  This is just the first step.

 No?  Hmm... I thought that was a main part of the functionality?

 Patches welcome ;)

 Seriously, how would you do it?  IE, I don't like how norms handle it
 today -- on changing a single value we must write the full array (for
 all docs).  Same problem w/ del docs, though since its 1 bit per doc
 the cost is far less.

 Better would be a stacked approach, where the orig full array remains
 and we write sparse deltas (pairs of docID + new value), and at init
 we load the base and apply all the diffs (in order).  Merging would
 periodically coalesce them down again...

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2374) Add introspection API to AttributeSource/AttributeImpl


 [ 
https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2374:
--

Attachment: LUCENE-2374-3x.patch

Here a first patch with the proposed API (thanks Earwin).

The patch is for 3.x, as it contains already the sophisticated(TM) backwards 
compatibility layer (see javadocs).

Still missing:
- Remove obsolete toString in contrib/queryparser
- Test for sophisticated bw
- Tests for API in general
- an AttributeChecker test class that checks basic Attribute features and its 
implementation (copyTo, reflectAsString,...)
- Solr changes to make use of this API in analysis.jsp and the other 
TokenStream component

What do you think?

 Add introspection API to AttributeSource/AttributeImpl
 --

 Key: LUCENE-2374
 URL: https://issues.apache.org/jira/browse/LUCENE-2374
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2374-3x.patch


 AttributeSource/TokenStream inspection in Solr needs to have some insight 
 into the contents of AttributeImpls. As LUCENE-2302 has some problems with 
 toString() [which is not structured and conflicts with CharSequence's 
 definition for CharTermAttribute], I propose an simple API that get a default 
 implementation in AttributeImpl (just like toString() current):
 - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
 iterator (for most attributes its a singleton) of a key-value pair, e.g. 
 term-foobar,startOffset-Integer.valueOf(0),...
 - AttributeSource gets the same method, it just concat the iterators of each 
 getAttributeImplsIterator() AttributeImpl
 No backwards problems occur, as the default toString() method will work like 
 before (it just gets iterator and lists), but we simply remove the 
 documentation for the format. (Char)TermAttribute gets a special impl fo 
 toString() according to CharSequence and a corresponding iterator.
 I also want to remove the abstract hashCode() and equals() methods from 
 AttributeImpl, as they are not needed and just create work for the 
 implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2832) on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory


[ 
https://issues.apache.org/jira/browse/LUCENE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982675#action_12982675
 ] 

Uwe Schindler commented on LUCENE-2832:
---

I would suggest to use a different default for Win64, as the adress space is 
not as small as with 32 bit. How about something like 4 GB or 16 GB?

Also, for 32bit we use 1/8 of possible address space, so why not the same (1/8) 
for win64?

 on Windows 64-bit, maybe we should default to a better maxBBufSize in 
 MMapDirectory
 ---

 Key: LUCENE-2832
 URL: https://issues.apache.org/jira/browse/LUCENE-2832
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2832.patch


 Currently the default max buffer size for MMapDirectory is 256MB on 32bit and 
 Integer.MAX_VALUE on 64bit:
 {noformat}
 public static final int DEFAULT_MAX_BUFF = Constants.JRE_IS_64BIT ? 
 Integer.MAX_VALUE : (256 * 1024 * 1024);
 {noformat}
 But, in windows on 64-bit, you are practically limited to 8TB. This can cause 
 problems in extreme cases, such as: 
 http://www.lucidimagination.com/search/document/7522ee54c46f9ca4/map_failed_at_getsearcher
 Perhaps it would be good to change this default such that its 256MB on 32Bit 
 *OR* windows, but leave it at Integer.MAX_VALUE
 on other 64-bit and 64-bit (48-bit) systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Query parser contract changes?

2011-01-17 Thread karl.wright

Another data point: the standard query parser actually ALSO fails when you do 
anything other than a *:* query.  When you specify a field name, it returns 
zero results:

root@duck93:/data/solr-dym/solr-dym# curl 
http://localhost:8983/solr/nose/standard?q=value_0:a*;
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint name=QTime7/intl
st name=paramsstr name=qvalue_0:a*/str/lst/lstresult name=respons
e numFound=0 start=0/
/response

But:

root@duck93:/data/solr-dym/solr-dym# curl 
http://localhost:8983/solr/nose/standard?q=*:*;
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint name=QTime244/int
lst name=paramsstr name=q*:*/str/lst/lstresult name=response nu
mFound=59431646 start=0docstr name=latitude40.55856/strstr name=l
ongitude44.37457/strstr name=referenceLANGUAGE=und|TYPE=STREET|ADDR_TOWN
SHIP_NAME=Armenia|ADDR_COUNTRY_NAME=Armenia|ADDR_STREET_NAME=A329|TITLE=A329, Ar
menia, Armenia/str/docdocstr name=latitude40.7703/strstr name=long
itude43.838/strstr name=referenceLANGUAGE=und|TYPE=STREET|ADDR_TOWNSHIP_
NAME=Armenia|ADDR_COUNTRY_NAME=Armenia|ADDR_STREET_NAME=A330|TITLE=A330, Armenia
...

The schema has not changed:

!-- Level 0 non-language value field --
field name=othervalue_0 type=string_idx_normed 
required=false/

...where string_idx_normed is declared in the following way:

fieldType name=string_idx_normed class=solr.TextField
indexed=true stored=false omitNorms=false
analyzer type=index
tokenizer class=solr.ICUTokenizerFactory /
filter class=solr.ICUFoldingFilterFactory /
/analyzer
analyzer type=query
tokenizer class=solr.ICUTokenizerFactory /
filter class=solr.ICUFoldingFilterFactory /
/analyzer
/fieldType

... which shouldn't matter anyway because even a simple TermQuery return from 
my query parser method doesn't work any more.

Karl

From: ext karl.wri...@nokia.com [mailto:karl.wri...@nokia.com]
Sent: Monday, January 17, 2011 10:30 AM
To: dev@lucene.apache.org
Subject: Query parser contract changes?

Hi folks,

I'm sorely puzzled by the fact that my QParser implementation ceased to work 
after the latest Solr/Lucene trunk update.  My previous update was about ten 
days ago, right after Mike made his index changes.

The symptom is that, although the query parser is correctly called, and seems 
to have the right arguments, the Query it is returning seems to be ignored.  I 
always get zero results.  I eliminated any possibility of error by just 
hardwiring the return of a TermQuery, and that too always yields zero results.

I was able to confirm, using the standard handler with the default query 
parser, that the index is in fine shape.  So I was wondering if the contract 
for QParser had changed in some subtle way that I missed?

Karl

[jira] Commented: (LUCENE-2832) on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory


[ 
https://issues.apache.org/jira/browse/LUCENE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982682#action_12982682
 ] 

Uwe Schindler commented on LUCENE-2832:
---

Sorry my last comment was stupid, as 1/8 of 8TB is still larger as 
Integer.MAX_VALUE (I was thinking of Long.MAX_VALUE).

I still have no idea why this fails, as 8 TB of address space should be enough 
for thousands of 2 GB blocks.

 on Windows 64-bit, maybe we should default to a better maxBBufSize in 
 MMapDirectory
 ---

 Key: LUCENE-2832
 URL: https://issues.apache.org/jira/browse/LUCENE-2832
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2832.patch


 Currently the default max buffer size for MMapDirectory is 256MB on 32bit and 
 Integer.MAX_VALUE on 64bit:
 {noformat}
 public static final int DEFAULT_MAX_BUFF = Constants.JRE_IS_64BIT ? 
 Integer.MAX_VALUE : (256 * 1024 * 1024);
 {noformat}
 But, in windows on 64-bit, you are practically limited to 8TB. This can cause 
 problems in extreme cases, such as: 
 http://www.lucidimagination.com/search/document/7522ee54c46f9ca4/map_failed_at_getsearcher
 Perhaps it would be good to change this default such that its 256MB on 32Bit 
 *OR* windows, but leave it at Integer.MAX_VALUE
 on other 64-bit and 64-bit (48-bit) systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Let's drop Maven Artifacts !

2011-01-17 Thread Steven A Rowe

On 1/17/2011 at 1:53 AM, Michael Busch wrote:
 I don't think any user needs the ability to run an ant target on
 Lucene's sources to produce maven artifacts

I want to be able to make modifications to the Lucene source, install Maven 
snapshot artifacts in my local repository, then depend on those snapshots from 
other projects.  I doubt I'm alone.

Steve

[jira] Commented: (LUCENE-2832) on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory


[ 
https://issues.apache.org/jira/browse/LUCENE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982692#action_12982692
 ] 

Robert Muir commented on LUCENE-2832:
-

In this case, its very extreme. the user had 1.1 billion documents on one 
windows server.

I am not sure if this issue will even help anyone at all: will a smaller buffer 
really help fragmentation in these cases?
The user never responded to my suggestion to change the buffer size.

I think a good option here is to do nothing at all, but I'm not opposed to 
reducing the buffer *if* it will actually help,
mainly because the MultiMMapIndexInput is sped up and it shouldn't cause as 
much slowdown as before.


 on Windows 64-bit, maybe we should default to a better maxBBufSize in 
 MMapDirectory
 ---

 Key: LUCENE-2832
 URL: https://issues.apache.org/jira/browse/LUCENE-2832
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2832.patch


 Currently the default max buffer size for MMapDirectory is 256MB on 32bit and 
 Integer.MAX_VALUE on 64bit:
 {noformat}
 public static final int DEFAULT_MAX_BUFF = Constants.JRE_IS_64BIT ? 
 Integer.MAX_VALUE : (256 * 1024 * 1024);
 {noformat}
 But, in windows on 64-bit, you are practically limited to 8TB. This can cause 
 problems in extreme cases, such as: 
 http://www.lucidimagination.com/search/document/7522ee54c46f9ca4/map_failed_at_getsearcher
 Perhaps it would be good to change this default such that its 256MB on 32Bit 
 *OR* windows, but leave it at Integer.MAX_VALUE
 on other 64-bit and 64-bit (48-bit) systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Let's drop Maven Artifacts !

On Mon, Jan 17, 2011 at 11:06 AM, Steven A Rowe sar...@syr.edu wrote:
 On 1/17/2011 at 1:53 AM, Michael Busch wrote:
 I don't think any user needs the ability to run an ant target on
 Lucene's sources to produce maven artifacts

 I want to be able to make modifications to the Lucene source, install Maven 
 snapshot artifacts in my local repository, then depend on those snapshots 
 from other projects.  I doubt I'm alone.


And personally I would be totally fine with this, where maven is in
/dev-tools, just like eclipse and idea configuration, and we can even
put a whole README.txt in there that says these are tools for
developers and if they start rotting they will be deleted without a
second thought.

but requiring special artifacts is a different story, its my
understanding that in anything but a hello world maven project you
need your own local repository anyway. So such a person can simply
install their own artifacts with /dev-tools into their local
repository... problem solved.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-17 Thread Shay Banon (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982695#action_12982695
]

Shay Banon commented on LUCENE-2474:

Yea, I got the reasoning for Set, we can use that, CHM with PRESENT. If you
want, I can attach a simple MapBackedSet that makes any Map a Set.

Still, I think that using CopyOnWriteArrayList is best here. I don't think that
adding and removing listeners is something that will be done often in an app.
But I might be mistaken. In this case, traversal over listeners is much better
on CopyOnWriteArrayList compared to CHM.

Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean
custom caches that use the IndexReader (getFieldCacheKey)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Let's drop Maven Artifacts !

2011-01-17 Thread Earwin Burrfoot

You're not alone. :)
But, I bet, much more people would like to skip that step and have
their artifacts downloaded from central.

On Mon, Jan 17, 2011 at 19:06, Steven A Rowe sar...@syr.edu wrote:
 On 1/17/2011 at 1:53 AM, Michael Busch wrote:
 I don't think any user needs the ability to run an ant target on
 Lucene's sources to produce maven artifacts

 I want to be able to make modifications to the Lucene source, install Maven 
 snapshot artifacts in my local repository, then depend on those snapshots 
 from other projects.  I doubt I'm alone.

 Steve




-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Let's drop Maven Artifacts !

On Mon, Jan 17, 2011 at 11:17 AM, Earwin Burrfoot ear...@gmail.com wrote:
 You're not alone. :)
 But, I bet, much more people would like to skip that step and have
 their artifacts downloaded from central.

Maybe, but perhaps they will need to compromise and use jar files or
install into their local themselves, because currently they have to
use svn checkout since we are letting maven issues prevent us from
releasing.

I think its been too long since we had a release, I'm gonna forget
maven exists and start working towards a release. I'll cross my
fingers and hope that I can get 3 +1 votes.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2832) on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory


 [ 
https://issues.apache.org/jira/browse/LUCENE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2832:


Fix Version/s: (was: 3.1)

I am removing 3.1 as I think its the safest option.

We can revisit if someone is willing to test parameters on enormous indexes 
(200GB, 500GB, 1TB, ...)
otherwise we are just guessing.


 on Windows 64-bit, maybe we should default to a better maxBBufSize in 
 MMapDirectory
 ---

 Key: LUCENE-2832
 URL: https://issues.apache.org/jira/browse/LUCENE-2832
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2832.patch


 Currently the default max buffer size for MMapDirectory is 256MB on 32bit and 
 Integer.MAX_VALUE on 64bit:
 {noformat}
 public static final int DEFAULT_MAX_BUFF = Constants.JRE_IS_64BIT ? 
 Integer.MAX_VALUE : (256 * 1024 * 1024);
 {noformat}
 But, in windows on 64-bit, you are practically limited to 8TB. This can cause 
 problems in extreme cases, such as: 
 http://www.lucidimagination.com/search/document/7522ee54c46f9ca4/map_failed_at_getsearcher
 Perhaps it would be good to change this default such that its 256MB on 32Bit 
 *OR* windows, but leave it at Integer.MAX_VALUE
 on other 64-bit and 64-bit (48-bit) systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1674) improve analysis tests, cut over to new API


[ 
https://issues.apache.org/jira/browse/SOLR-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982740#action_12982740
 ] 

Robert Muir commented on SOLR-1674:
---

i'd still like to add posinc tests for some of these tokenstreams,
but also other ones in the analyzers module too (e.g. ones from lucene contrib).

i'll set 3.2 for now.

 improve analysis tests, cut over to new API
 ---

 Key: SOLR-1674
 URL: https://issues.apache.org/jira/browse/SOLR-1674
 Project: Solr
  Issue Type: Test
  Components: Schema and Analysis
Reporter: Robert Muir
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-1674.patch, SOLR-1674.patch, SOLR-1674_speedup.patch


 This patch
 * converts all analysis tests to use the new tokenstream api
 * converts most tests to use the more stringent assertion mechanisms from 
 lucene
 * adds new tests to improve coverage
 Most bugs found by more stringent testing have been fixed, with the exception 
 of SynonymFilter.
 The problems with this filter are more serious, the previous tests were 
 essentially a no-op.
 The new tests for SynonymFilter test the current behavior, but have FIXMEs 
 with what I think the old test wanted to expect in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1674) improve analysis tests, cut over to new API


 [ 
https://issues.apache.org/jira/browse/SOLR-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1674:
--

Fix Version/s: (was: 3.1)
   (was: 1.5)

 improve analysis tests, cut over to new API
 ---

 Key: SOLR-1674
 URL: https://issues.apache.org/jira/browse/SOLR-1674
 Project: Solr
  Issue Type: Test
  Components: Schema and Analysis
Reporter: Robert Muir
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-1674.patch, SOLR-1674.patch, SOLR-1674_speedup.patch


 This patch
 * converts all analysis tests to use the new tokenstream api
 * converts most tests to use the more stringent assertion mechanisms from 
 lucene
 * adds new tests to improve coverage
 Most bugs found by more stringent testing have been fixed, with the exception 
 of SynonymFilter.
 The problems with this filter are more serious, the previous tests were 
 essentially a no-op.
 The new tests for SynonymFilter test the current behavior, but have FIXMEs 
 with what I think the old test wanted to expect in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Closed: (LUCENE-2552) If index is pre-3.0 IndexWriter does not fail on open


 [ 
https://issues.apache.org/jira/browse/LUCENE-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-2552.
-

   Resolution: Duplicate
Lucene Fields:   (was: [New])

Duplicate of LUCENE-2720

 If index is pre-3.0 IndexWriter does not fail on open
 -

 Key: LUCENE-2552
 URL: https://issues.apache.org/jira/browse/LUCENE-2552
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 3.1, 4.0


 IndexReader.open() fails for all old pre-3.0 indexes in Lucene trunk. This is 
 tested by TestBackwardCompatibility. On the other hand, IndexWriter's ctor 
 does not fail on open an existing index, that contains an old segment, 
 because it does not check preexisting segments. It only throws 
 IndexFormatTooOldException, if you merge segments or open a getReader(). When 
 ConcurrentMergeScheduler  is used, this may happen in an foreign thread which 
 makes it even worse.
 Mike and me propose:
 - In 3.x introduce a new segments file format when committing, that contains 
 the oldest and newest version of the index segments (not sure which version 
 number to take here), this file format has new version, so its easy to detect 
 (DefaultSegmentsFileWriter/Reader)
 - In trunk when opening IndexWriter check the following: If segments file is 
 in new format, check minimum version from this file, if pre-3.0 throw IFTOE; 
 if segments file is in old format (can be 3.0 or 3.x index not yet updated), 
 try to open FieldsReader, as 2.9 indexes only can be detected using this - 
 older indexes should fail before and never come to that place. If this 
 succeeds, write a new segments file in new format (maybe after commit or 
 whatever)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2279) Add a MockDirectoryFactory (or similar) for Solr tests

[
https://issues.apache.org/jira/browse/SOLR-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated SOLR-2279:
--

Fix Version/s: (was: 3.1)

moving out.. i don't see myself fixing this test issue very quickly.

Add a MockDirectoryFactory (or similar) for Solr tests
--

Key: SOLR-2279
URL: https://issues.apache.org/jira/browse/SOLR-2279
Project: Solr
Issue Type: Test
Components: Build
Reporter: Robert Muir
Fix For: 4.0

Attachments: SOLR-2279.patch

Currently, all Lucene tests open directories with newDirectory() [and
soon-to-be added newFSDirectory() which always ensures the directory returned
is an FSDir subclass, see LUCENE-2804 for this]. Additionally the directory
is wrapped with MockDirectoryWrapper.
This has a number of advantages:
* By default the directory implementation is random, but you can easily
specify a specific impl e.g. -Dtests.directory=MMapDirectory. When proposing
a change to one of our directory implementations, we can run all tests with
it this way... it would be good for Solr tests to respect this too.
* The test framework (LuceneTestCase before/afterclass) ensures that these
directories are properly closed, if not, it causes the test to fail with a
stacktrace of where you
first opened the directory.
* MockDirectoryWrapper.close() then ensures that there are no resource leaks
by default, when you open a file they save the stacktrace of where you opened
it from. If you try to close the directory without say, closing an
IndexReader, it fails with the stacktrace of where you opened the reader
from. This is helpful for tracking down resource leaks. Currently Solr warns
if it cannot delete its test temporary directory, but this is better since
you know exactly where the resource leak came from. This can be disabled with
an optional setter which we should probably expose for some tests that have
known leaks like SpellCheck.
* MockDirectoryWrapper enforce consistent test behavior on any operating
system, as it won't be dependent on the return value of FSDirectory.open
* MockDirectoryWrapper has a number of other checks and features, such as
simulating a crash, simulating disk full, emulating windows (where you can't
delete open files), etc.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl


 [ 
https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2374:
--

Summary: Add reflection API to AttributeSource/AttributeImpl  (was: Add 
introspection API to AttributeSource/AttributeImpl)

 Add reflection API to AttributeSource/AttributeImpl
 ---

 Key: LUCENE-2374
 URL: https://issues.apache.org/jira/browse/LUCENE-2374
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2374-3x.patch


 AttributeSource/TokenStream inspection in Solr needs to have some insight 
 into the contents of AttributeImpls. As LUCENE-2302 has some problems with 
 toString() [which is not structured and conflicts with CharSequence's 
 definition for CharTermAttribute], I propose an simple API that get a default 
 implementation in AttributeImpl (just like toString() current):
 - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
 iterator (for most attributes its a singleton) of a key-value pair, e.g. 
 term-foobar,startOffset-Integer.valueOf(0),...
 - AttributeSource gets the same method, it just concat the iterators of each 
 getAttributeImplsIterator() AttributeImpl
 No backwards problems occur, as the default toString() method will work like 
 before (it just gets iterator and lists), but we simply remove the 
 documentation for the format. (Char)TermAttribute gets a special impl fo 
 toString() according to CharSequence and a corresponding iterator.
 I also want to remove the abstract hashCode() and equals() methods from 
 AttributeImpl, as they are not needed and just create work for the 
 implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2261) layout.vm refers to old version of jquery


 [ 
https://issues.apache.org/jira/browse/SOLR-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2261.
---

Resolution: Fixed

This was not downloading the file at all, instead it was getting a 404 error as 
Eric described.

Committed revision 1060014.

Thanks Eric!


 layout.vm refers to old version of jquery
 -

 Key: SOLR-2261
 URL: https://issues.apache.org/jira/browse/SOLR-2261
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Eric Pugh
Priority: Minor
 Fix For: 3.1


 The velocity template layout.vm that includes jquery refers to an older 1.2.3 
 version of jquery:
 http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/example/solr/conf/velocity/layout.vm
 Checked in is a new 1.4.3 version: 
 http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/src/webapp/web/admin/
 The line that says: script type=text/javascript 
 src=#{url_for_solr}/admin/jquery-1.2.3.min.js/script should be changed 
 to script type=text/javascript 
 src=#{url_for_solr}/admin/jquery-1.4.3.min.js/script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Let's drop Maven Artifacts !

2011-01-17 Thread Mattmann, Chris A (388J)

On Jan 17, 2011, at 8:06 AM, Steven A Rowe wrote:

 On 1/17/2011 at 1:53 AM, Michael Busch wrote:
 I don't think any user needs the ability to run an ant target on
 Lucene's sources to produce maven artifacts
 
 I want to be able to make modifications to the Lucene source, install Maven 
 snapshot artifacts in my local repository, then depend on those snapshots 
 from other projects.  I doubt I'm alone.

+1, you're not. The only way I've ever used Lucene has been via a Maven 
dependency, and that was the original way I found it starting way back in 
lucene-core-2.0.0. If Lucene wasn't in Maven, it would be a HUGE 
disappointment, and an impediment towards using it.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-17 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982761#action_12982761
]

Yonik Seeley commented on LUCENE-2474:
--

bq. Still, I think that using CopyOnWriteArrayList is best here.

Agree - I think we should optimize for good/correct behavior.

I'd like even more for there to be just a single CopyOnWriteArrayList per
top-level reader that is then propagated to all sub/segment readers,
including new ones on a reopen. But I guess Mike indicated that was currently
too hard/hairy.

The static is really non-optimal though - among other problems, it requires
systems with multiple readers (and wants to do different things with different
readers, such as maintain separate caches) to figure out what top-level reader
a segment reader is associated with. And given that we are dealing with
IndexReader instances in the callbacks, and not ReaderContext objects, this
seems impossible?

Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean
custom caches that use the IndexReader (getFieldCacheKey)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-849) Add bwlimit support to snappuller

2011-01-17 Thread Otis Gospodnetic (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-849.
---

Resolution: Duplicate

Implemented in SOLR-2099.

 Add bwlimit support to snappuller
 -

 Key: SOLR-849
 URL: https://issues.apache.org/jira/browse/SOLR-849
 Project: Solr
  Issue Type: Improvement
  Components: replication (scripts)
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: SOLR-849.patch


 From http://markmail.org/message/njnbh5gbb2mvfe24

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Let's drop Maven Artifacts !

2011-01-17 Thread Steven A Rowe

On 1/17/2011 at 11:25 AM, Robert Muir wrote:
 On Mon, Jan 17, 2011 at 11:06 AM, Steven A Rowe sar...@syr.edu wrote:
  On 1/17/2011 at 1:53 AM, Michael Busch wrote:
  I don't think any user needs the ability to run an ant target on
  Lucene's sources to produce maven artifacts
 
  I want to be able to make modifications to the Lucene source, install
  Maven snapshot artifacts in my local repository, then depend on those
  snapshots from other projects.  I doubt I'm alone.
 
 And personally I would be totally fine with this, where maven is in
 /dev-tools, just like eclipse and idea configuration, and we can even
 put a whole README.txt in there that says these are tools for
 developers and if they start rotting they will be deleted without a
 second thought.
 
 but requiring special artifacts is a different story

I have it wrong in LUCENE-2657.  It creates special artifacts intended for 
publishing via public Maven repositories.  But for the purposes of publishing 
(as opposed to locally modified sources), the artifacts published through 
public Maven repositories should be *exactly* the same ones produced by the Ant 
build, with the obvious exception of the POMs.  This is the model used by 
previous releases, and if we continue the tradition of publishing Maven 
artifacts (as we have since the 1.9.1 release), the model should not change.

Steve

[jira] Resolved: (SOLR-2259) Improve analyzer/version handling in Solr

[
https://issues.apache.org/jira/browse/SOLR-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir resolved SOLR-2259.
---

Resolution: Fixed

Improve analyzer/version handling in Solr
-

Key: SOLR-2259
URL: https://issues.apache.org/jira/browse/SOLR-2259
Project: Solr
Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0

Attachments: SOLR-2259.patch, SOLR-2259.patch, SOLR-2259_part3.patch,
SOLR-2259part2.patch, SOLR-2259part4.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2269) contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt


[ 
https://issues.apache.org/jira/browse/SOLR-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982794#action_12982794
 ] 

Robert Muir commented on SOLR-2269:
---

I just realized I've made the same mistake (somehow, i never noticed these 
contribs had their own CHANGES.txt files)

I'll start working on sorting out these CHANGES.txt's and synchronizing them in 
branch_3x/trunk to be consistent .


 contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt
 

 Key: SOLR-2269
 URL: https://issues.apache.org/jira/browse/SOLR-2269
 Project: Solr
  Issue Type: Task
  Components: contrib - Clustering, contrib - DataImportHandler, 
 contrib - Solr Cell (Tika extraction)
Affects Versions: 3.1, 4.0
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0


 http://www.lucidimagination.com/search/document/b8c19488a691265c/changes_mess
 {quote}
 I realized that some entries for DIH are in
 solr/CHANGES.txt. These should go solr/contrib/dataimporthandler/CHANGES.txt
 (Some of them are my fault). I also found that solr/contrib/*/CHANGES.txt
 have 1.5-dev title. These should be 4.0-dev or 3.1-dev.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-17 Thread Chris A. Mattmann (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982797#action_12982797
 ] 

Chris A. Mattmann commented on LUCENE-2657:
---

Hey Guys,

I've set this up on some other Apache projects (Nutch, Tika [NetCDF4] and SIS 
so far), and basically it involved:

1. moddin'g build.xml according to Sonatype's guide (see build.xml section)

https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide

2. adding pom.xmls for each artifact to be published

I'll throw together a patch for this and see if I can't make this process a bit 
easier.

Thanks.

Cheers,
Chris

 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2315) analysis.jsp highlight matches no longer works


[ 
https://issues.apache.org/jira/browse/SOLR-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982822#action_12982822
 ] 

Uwe Schindler commented on SOLR-2315:
-

I found the bug, will fix together with analysis.jsp rewrite in LUCENE-2374 
(this changes lots of internals so its easy to fix).

Problem is that a non-generified List[] in printRow causes wrong contains 
lookup that always returns false, so matching tokens are never seen..

 analysis.jsp highlight matches no longer works
 

 Key: SOLR-2315
 URL: https://issues.apache.org/jira/browse/SOLR-2315
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Hoss Man
 Fix For: 3.1, 4.0


 As noted by Teruhiko Kurosaka on the mailing list, at some point since Solr 
 1.4, highlight matches stoped working on the analysis.jsp  -- on both the 
 3x and trunk branches

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2269) contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt


 [ 
https://issues.apache.org/jira/browse/SOLR-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2269.
---

Resolution: Fixed

Committed revision 1060057, 1060058 (3x)

 contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt
 

 Key: SOLR-2269
 URL: https://issues.apache.org/jira/browse/SOLR-2269
 Project: Solr
  Issue Type: Task
  Components: contrib - Clustering, contrib - DataImportHandler, 
 contrib - Solr Cell (Tika extraction)
Affects Versions: 3.1, 4.0
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0


 http://www.lucidimagination.com/search/document/b8c19488a691265c/changes_mess
 {quote}
 I realized that some entries for DIH are in
 solr/CHANGES.txt. These should go solr/contrib/dataimporthandler/CHANGES.txt
 (Some of them are my fault). I also found that solr/contrib/*/CHANGES.txt
 have 1.5-dev title. These should be 4.0-dev or 3.1-dev.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2160) Unknown query type 'func'


 [ 
https://issues.apache.org/jira/browse/SOLR-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2160.
---

Resolution: Fixed

Marking as fixed... please reopen if you think there might still be a bug, 
but again I haven't seen issues in a very long time

 Unknown query type 'func'
 -

 Key: SOLR-2160
 URL: https://issues.apache.org/jira/browse/SOLR-2160
 Project: Solr
  Issue Type: Test
  Components: Build
Affects Versions: 3.1, 4.0
 Environment: Hudson
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: SOLR-2160.patch


 Several test methods in TestTrie failed in hudson, with errors such as this:
 Caused by: org.apache.solr.common.SolrException: Unknown query type 'func'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Let's drop Maven Artifacts !

2011-01-17 Thread Michael Busch


On 1/17/11 8:06 AM, Steven A Rowe wrote:

On 1/17/2011 at 1:53 AM, Michael Busch wrote:

I don't think any user needs the ability to run an ant target on
Lucene's sources to produce maven artifacts

I want to be able to make modifications to the Lucene source, install Maven 
snapshot artifacts in my local repository, then depend on those snapshots from 
other projects.  I doubt I'm alone.



This is something I would feel comfortable not supporting in Lucene 
out-of-the-box, because if someone needs to use modified sources it's 
not unreasonable to expect that they can also create their own pom files 
for the modified jars.


I do think though that we should keep publishing official artifacts to 
a central repo.


 Michael

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Let's drop Maven Artifacts !

2011-01-17 Thread Steven A Rowe

On 1/17/2011 at 3:05 PM, Michael Busch wrote:
 On 1/17/11 8:06 AM, Steven A Rowe wrote:
  On 1/17/2011 at 1:53 AM, Michael Busch wrote:
  I don't think any user needs the ability to run an ant target on
  Lucene's sources to produce maven artifacts
  I want to be able to make modifications to the Lucene source, install
  Maven snapshot artifacts in my local repository, then depend on those
  snapshots from other projects.  I doubt I'm alone.
 
 
 This is something I would feel comfortable not supporting in Lucene
 out-of-the-box, because if someone needs to use modified sources it's
 not unreasonable to expect that they can also create their own pom files
 for the modified jars.

This makes zero sense to me - no one will ever make their own POMs, except 
maybe the empty shells Maven will auto-create for you when run the 
install:install-file goal.

The key thing that LUCENE-2657 provides is POMs that can be verified correct 
via Maven itself - when Maven performs a build, the POMs are checked for 
correctness, and if the build fails, you can tell something is wrong.  Anything 
short of that won't cut it long term.

Maybe from your perspective building the project with the POMs is unnecessary, 
but from mine it is a *requirement*.  

And, happily IMHO, users get local build/install for free.

Steve

[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level

2011-01-17 Thread Doron Cohen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982872#action_12982872
]

Doron Cohen commented on LUCENE-2236:
-

{quote}
well my concern about the deprecated methods is we get into the hairy backwards
compat situation...
It might be ok to essentially fix Similarity to be the way we want for 4.0
(break it) since its an expert API anyway.
This patch was just a quick stab...
I definitely agree with you about the name though, i prefer Similarity.
{quote}

So let's keep that name (Similarity) :)

{quote}
Well honestly I think what you are saying is really needed for the future
{quote}

Ok one step at a time makes sense.. so it means that fieldName parameters
remain, although the Similarity object is created per given field, well, ok,
another day...

{quote}
Similarity would need to be able to 'setup' a query (e.g. things like IDF,
building score caches for the query, whatever), and then also score an
individual document.
{quote}

Interesting...
(flexible-scoring and bulk-postings works are still unknowns to me.)
So Similarity is not only per field but also per query/scorer..
and Query would have an abstract method getSimilarityProvider(fieldName) which
would be implemented by each concrete query, neatly separating finding matches
from scores computation, and allowing more extendable scoring. Nice.
Also, perhaps what seems to be like an inflation of Similarity objects (per
query per field) is one more good reason to keep the field name params for now.

Similarity can only be set per index, but I may want to adjust scoring
behaviour at a field level
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Let's drop Maven Artifacts !

2011-01-17 Thread Michael Busch


On 1/17/11 12:27 PM, Steven A Rowe wrote:

This makes zero sense to me - no one will ever make their own POMs


I did :) (for a different project though).


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level

[
https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982888#action_12982888
]

Robert Muir commented on LUCENE-2236:
-

bq. So let's keep that name (Similarity)

OK, I'll fix the patch, to rename FieldSimilarity-Similarity

{quote}
So Similarity is not only per field but also per query/scorer..
and Query would have an abstract method getSimilarityProvider(fieldName) which
would be implemented by each concrete query, neatly separating finding matches
from scores computation, and allowing more extendable scoring. Nice.
Also, perhaps what seems to be like an inflation of Similarity objects (per
query per field) is one more good reason to keep the field name params for now.
{quote}

Well I'm not totally sure how we want to do it, but definitely I think we want
to split Scorer's calculations and finding matches as you say,
and also split Weight's calculations and resource management

For example, TermWeight today has a PerReaderTermState, which contains all the
information you need to calculate the setup portion
without doing any real I/O (e.g. docFreq, totalTermFreq, totalCollectionFreq,
...) So maybe this is the right thing to pass to Similarity's query setup.

The Weight then would just be responsible for managing termstate and creating a
Scorer...

I think also the Similarity needs to be fully responsible for Explanations...
but most users wouldn't have to interact with this I think.
Instead I think typically their base class (TFIDFSimilarity or whatever it
is) would typically provide this, based on the methods and API
it exposes: tf(), idf(), but this would allow us to also have other
fully-fleshed out base classes like BM25Similarity, that you can extend
and tune based on the parameters that make sense to it.

Anyway these are just some thoughts, first I'm going to adjust the patch to
keep our existing name Similarity.

Similarity can only be set per index, but I may want to adjust scoring
behaviour at a field level
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-17 Thread Michael Busch (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982900#action_12982900
]

Michael Busch commented on LUCENE-2324:
---

My last commit yesterday made almost all test cases pass.

The ones that test flush-by-ram are still failing. Also TestStressIndexing2
still fails. The reason has to do with how deletes are pushed into
bufferedDeletes. E.g. if I call addDocument() instead of updateDocument() in
TestStressIndexing.IndexerThread then the test passes.

I need to look more into that problem, but otherwise it's looking good and
we're pretty close!

Per thread DocumentsWriters that write their own private segments
-

Key: LUCENE-2324
URL: https://issues.apache.org/jira/browse/LUCENE-2324
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch,
test.out, test.out, test.out, test.out

See LUCENE-2293 for motivation and more details.
I'm copying here Mike's summary he posted on 2293:
Change the approach for how we buffer in RAM to a more isolated
approach, whereby IW has N fully independent RAM segments
in-process and when a doc needs to be indexed it's added to one of
them. Each segment would also write its own doc stores and
normal segment merging (not the inefficient merge we now do on
flush) would merge them. This should be a good simplification in
the chain (eg maybe we can remove the *PerThread classes). The
segments can flush independently, letting us make much better
concurrent use of IO CPU.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-17 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated LUCENE-2324:
-

Attachment: LUCENE-2324.patch

Very nice! Looks like we needed all kinds of IW syncs? I noticed that in
addition to TestStressIndexing2, TestNRTThreads was also failing. The attached
patch fixes both by adding a sync on DW for deletes (and the update doc delete
term). Time to add the RAM usage?

Per thread DocumentsWriters that write their own private segments
-

Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch,
LUCENE-2324.patch, test.out, test.out, test.out, test.out

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Release schedule Lucene 4?

2011-01-17 Thread Michael McCandless

Yes!

Mike

On Mon, Jan 17, 2011 at 7:47 AM, Shai Erera ser...@gmail.com wrote:
 This sounds like incremental field updates :).

 Shai

 On Mon, Jan 17, 2011 at 1:24 PM, Michael McCandless
 luc...@mikemccandless.com wrote:

 On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
  But: they don't yet support updating the values (the goal is to allow
  this, eventually).  This is just the first step.
 
  No?  Hmm... I thought that was a main part of the functionality?

 Patches welcome ;)

 Seriously, how would you do it?  IE, I don't like how norms handle it
 today -- on changing a single value we must write the full array (for
 all docs).  Same problem w/ del docs, though since its 1 bit per doc
 the cost is far less.

 Better would be a stacked approach, where the orig full array remains
 and we write sparse deltas (pairs of docID + new value), and at init
 we load the base and apply all the diffs (in order).  Merging would
 periodically coalesce them down again...

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Release schedule Lucene 4?

2011-01-17 Thread Michael McCandless

I am hoping that it'll be in 2011... but don't hold me to that.  It's
really not possible to predict!

You can always use a trunk version and give feedback :)

But beware that it's unsable, meaning APIs and the index format can
suddenly change.

Mike

On Mon, Jan 17, 2011 at 8:51 AM, Gregor Heinrich gre...@arbylon.net wrote:
 Hi Mike, all --

 a (sorrily slow) thanks for this response ;)

 From the ensuing discussion, it sounds like there's a LOT to be in v4, and
 not raising wrong expectation by giving dates is appreciated ;)

 Only thing is, are we talking any time in 2012 or 2011, just to have a
 coarse-grained estimate without any assumptions attached?

 Best

 gregor





 On 1/15/11 3:20 PM, Michael McCandless wrote:

 This is unfortunately hard to say!

 There's tons of good stuff in 4.0, so we'd really like to release
 sooner rather than later.

 But then there's also alot of work remaining, eg we have 3 feature
 branches in flight right now, that we need to wrap up and land on
 trunk:

   * realtime (gives us concurrent flushing during indexing)

   * docvalues (adds column-stride fields)

   * bulkpostings (gives good search speedup for intblock codecs)

 Plus many open Jira issues.  So it's hard to predict when all of this
 will be done

 Mike

 On Fri, Jan 14, 2011 at 12:31 PM, Gregor Heinrichgre...@arbylon.net
  wrote:

 Dear Lucene team,

 I am wondering whether there is an updated Lucene release schedule for
 the
 v4.0 stream.

 Any earliest/latest alpha/beta/stable date? And if not yet, where to
 track
 such info?

 Thanks in advance from Germany

 gregor

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl


 [ 
https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2374:
--

Attachment: shot2.png
shot1.png
LUCENE-2374-3x.patch

New patch with analysis.jsp fixed (also SOLR-2315):
- highlighting works again
- only attributes are shown that exist at the step of analysis
- attribute names changed a little bit, because it uses the ones from 
reflection api
- Attribute class name shown in mouse hover
- start/endOffset now in two different lines (no chance to do it other without 
another special case)
- payloads are no longer printed as text, because it used default platform 
encoding (new String(byte[]))!

I also added some example screenshots!

 Add reflection API to AttributeSource/AttributeImpl
 ---

 Key: LUCENE-2374
 URL: https://issues.apache.org/jira/browse/LUCENE-2374
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2374-3x.patch, LUCENE-2374-3x.patch, shot1.png, 
 shot2.png


 AttributeSource/TokenStream inspection in Solr needs to have some insight 
 into the contents of AttributeImpls. As LUCENE-2302 has some problems with 
 toString() [which is not structured and conflicts with CharSequence's 
 definition for CharTermAttribute], I propose an simple API that get a default 
 implementation in AttributeImpl (just like toString() current):
 - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
 iterator (for most attributes its a singleton) of a key-value pair, e.g. 
 term-foobar,startOffset-Integer.valueOf(0),...
 - AttributeSource gets the same method, it just concat the iterators of each 
 getAttributeImplsIterator() AttributeImpl
 No backwards problems occur, as the default toString() method will work like 
 before (it just gets iterator and lists), but we simply remove the 
 documentation for the format. (Char)TermAttribute gets a special impl fo 
 toString() according to CharSequence and a corresponding iterator.
 I also want to remove the abstract hashCode() and equals() methods from 
 AttributeImpl, as they are not needed and just create work for the 
 implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl


 [ 
https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2374:
--

Attachment: shot4.png
shot3.png

 Add reflection API to AttributeSource/AttributeImpl
 ---

 Key: LUCENE-2374
 URL: https://issues.apache.org/jira/browse/LUCENE-2374
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2374-3x.patch, LUCENE-2374-3x.patch, shot1.png, 
 shot2.png, shot3.png, shot4.png


 AttributeSource/TokenStream inspection in Solr needs to have some insight 
 into the contents of AttributeImpls. As LUCENE-2302 has some problems with 
 toString() [which is not structured and conflicts with CharSequence's 
 definition for CharTermAttribute], I propose an simple API that get a default 
 implementation in AttributeImpl (just like toString() current):
 - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
 iterator (for most attributes its a singleton) of a key-value pair, e.g. 
 term-foobar,startOffset-Integer.valueOf(0),...
 - AttributeSource gets the same method, it just concat the iterators of each 
 getAttributeImplsIterator() AttributeImpl
 No backwards problems occur, as the default toString() method will work like 
 before (it just gets iterator and lists), but we simply remove the 
 documentation for the format. (Char)TermAttribute gets a special impl fo 
 toString() according to CharSequence and a corresponding iterator.
 I also want to remove the abstract hashCode() and equals() methods from 
 AttributeImpl, as they are not needed and just create work for the 
 implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl


[ 
https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982921#action_12982921
 ] 

Robert Muir commented on LUCENE-2374:
-

+1, this looks great.

i think its really important to show all the attributes in analysis.jsp, e.g. 
KeywordAttribute.


 Add reflection API to AttributeSource/AttributeImpl
 ---

 Key: LUCENE-2374
 URL: https://issues.apache.org/jira/browse/LUCENE-2374
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2374-3x.patch, LUCENE-2374-3x.patch, shot1.png, 
 shot2.png, shot3.png, shot4.png


 AttributeSource/TokenStream inspection in Solr needs to have some insight 
 into the contents of AttributeImpls. As LUCENE-2302 has some problems with 
 toString() [which is not structured and conflicts with CharSequence's 
 definition for CharTermAttribute], I propose an simple API that get a default 
 implementation in AttributeImpl (just like toString() current):
 - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
 iterator (for most attributes its a singleton) of a key-value pair, e.g. 
 term-foobar,startOffset-Integer.valueOf(0),...
 - AttributeSource gets the same method, it just concat the iterators of each 
 getAttributeImplsIterator() AttributeImpl
 No backwards problems occur, as the default toString() method will work like 
 before (it just gets iterator and lists), but we simply remove the 
 documentation for the format. (Char)TermAttribute gets a special impl fo 
 toString() according to CharSequence and a corresponding iterator.
 I also want to remove the abstract hashCode() and equals() methods from 
 AttributeImpl, as they are not needed and just create work for the 
 implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-17 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982926#action_12982926
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

Looks like TestNRTThreads is still sometimes failing, if I moved the sync 
around then it passes and TestStressIndexing2 fails.  

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, 
 LUCENE-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2547) minimize autoboxing in NumericField

2011-01-17 Thread Simon Willnauer (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982927#action_12982927
]

Simon Willnauer commented on LUCENE-2547:
-

bq. I didn't dive deep into the details of this issue, but what will someone
who has only long/int (and not their counter objects) do? Will he need to
create a Long/Integer out of them?
Shai, there are primitive setters already though.

bq. The parameters cannot be null, so at least a null-check is missing
agreed

bq. If you try this out with a profiler you see no difference at all (loop
creating a field and setting lots of values) - the objects are shortliving so
JRE optimizes (allocates on thread local heap).
agreed!

I think calling setLong(longRef.longValue()) is not a big deal and an API
change / addition is not needed here. Moving out?

minimize autoboxing in NumericField
---

Attachments: LUCENE-2547.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2547) minimize autoboxing in NumericField


[ 
https://issues.apache.org/jira/browse/LUCENE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982928#action_12982928
 ] 

Uwe Schindler commented on LUCENE-2547:
---

That's I am talking about, don't clutter API for such useless and inconsistent 
stuff

 minimize autoboxing in NumericField
 ---

 Key: LUCENE-2547
 URL: https://issues.apache.org/jira/browse/LUCENE-2547
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.0.2
Reporter: Woody Anderson
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2547.patch


 dicIf you already have a Integer/Long/Double etc. 
 numericField.setLongValue(long) causes an unnecessary auto-unbox.
 actually, since internal to setLongValue there is:
 {code}
 fieldsData = Long.valueOf(value);
 {code}
 then, there is an explicit box anyway, so this makes setLongValue(Long) with 
 an auto-box of long roughly the same as setLongValue(long), but better if you 
 started with a Long.
 Long being replaceable with Integer, Float, Double etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2547) minimize autoboxing in NumericField

2011-01-17 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2547.
-

Resolution: Won't Fix

moving out 

 minimize autoboxing in NumericField
 ---

 Key: LUCENE-2547
 URL: https://issues.apache.org/jira/browse/LUCENE-2547
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.0.2
Reporter: Woody Anderson
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2547.patch


 dicIf you already have a Integer/Long/Double etc. 
 numericField.setLongValue(long) causes an unnecessary auto-unbox.
 actually, since internal to setLongValue there is:
 {code}
 fieldsData = Long.valueOf(value);
 {code}
 then, there is an explicit box anyway, so this makes setLongValue(Long) with 
 an auto-box of long roughly the same as setLongValue(long), but better if you 
 started with a Long.
 Long being replaceable with Integer, Float, Double etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl

2011-01-17 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982931#action_12982931
 ] 

Simon Willnauer commented on LUCENE-2374:
-

nice work uwe!! +1 ;)

 Add reflection API to AttributeSource/AttributeImpl
 ---

 Key: LUCENE-2374
 URL: https://issues.apache.org/jira/browse/LUCENE-2374
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2374-3x.patch, LUCENE-2374-3x.patch, shot1.png, 
 shot2.png, shot3.png, shot4.png


 AttributeSource/TokenStream inspection in Solr needs to have some insight 
 into the contents of AttributeImpls. As LUCENE-2302 has some problems with 
 toString() [which is not structured and conflicts with CharSequence's 
 definition for CharTermAttribute], I propose an simple API that get a default 
 implementation in AttributeImpl (just like toString() current):
 - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an 
 iterator (for most attributes its a singleton) of a key-value pair, e.g. 
 term-foobar,startOffset-Integer.valueOf(0),...
 - AttributeSource gets the same method, it just concat the iterators of each 
 getAttributeImplsIterator() AttributeImpl
 No backwards problems occur, as the default toString() method will work like 
 before (it just gets iterator and lists), but we simply remove the 
 documentation for the format. (Char)TermAttribute gets a special impl fo 
 toString() according to CharSequence and a corresponding iterator.
 I also want to remove the abstract hashCode() and equals() methods from 
 AttributeImpl, as they are not needed and just create work for the 
 implementor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2872) Terms dict should block-encode terms

Terms dict should block-encode terms


 Key: LUCENE-2872
 URL: https://issues.apache.org/jira/browse/LUCENE-2872
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0
 Attachments: LUCENE-2872.patch

With PrefixCodedTermsReader/Writer we now encode each term standalone,
ie its bytes, metadata, details for postings (frq/prox file pointers),
etc.

But, this is costly when something wants to visit many terms but pull
metadata for only few (eg respelling, certain MTQs).  This is
particularly costly for sep codec because it has more metadata to
store, per term.

So instead I think we should block-encode all terms between indexed
term, so that the metadata is stored column stride instead.  This
makes it faster to enum just terms.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2872) Terms dict should block-encode terms


 [ 
https://issues.apache.org/jira/browse/LUCENE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2872:
---

Attachment: LUCENE-2872.patch

Patch.

I think it's basically working, but there are still a bunch of nocommits.

 Terms dict should block-encode terms
 

 Key: LUCENE-2872
 URL: https://issues.apache.org/jira/browse/LUCENE-2872
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2872.patch


 With PrefixCodedTermsReader/Writer we now encode each term standalone,
 ie its bytes, metadata, details for postings (frq/prox file pointers),
 etc.
 But, this is costly when something wants to visit many terms but pull
 metadata for only few (eg respelling, certain MTQs).  This is
 particularly costly for sep codec because it has more metadata to
 store, per term.
 So instead I think we should block-encode all terms between indexed
 term, so that the metadata is stored column stride instead.  This
 makes it faster to enum just terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2654) bulk-code each chunk b/w indexed terms in the terms dict