date:20110609

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8710/

3 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-131: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test7041670436tmp/_c.nrm
 (Too many open files in system)

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-131:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test7041670436tmp/_c.nrm
 (Too many open files in system)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test7041670436tmp/_c.nrm
 (Too many open files in system)
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822)


REGRESSION:  
org.apache.lucene.index.TestStressIndexing.testStressIndexAndSearching

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: 
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
at 
org.apache.lucene.index.TestStressIndexing.runStressTest(TestStressIndexing.java:152)
at 
org.apache.lucene.index.TestStressIndexing.testStressIndexAndSearching(TestStressIndexing.java:165)


REGRESSION:  org.apache.lucene.store.TestMultiMMap.testRandomChunkSizes

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/TestMultiMMap3983339852tmp/mmap363983339854tmp/_0_0.tib
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/TestMultiMMap3983339852tmp/mmap363983339854tmp/_0_0.tib
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:416)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:293)
at 
org.apache.lucene.index.codecs.BlockTermsWriter.init(BlockTermsWriter.java:75)
at 
org.apache.lucene.index.codecs.mocksep.MockSepCodec.fieldsConsumer(MockSepCodec.java:73)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.init(PerFieldCodecWrapper.java:67)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:55)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:58)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:80)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:75)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:457)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:421)
at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:313)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:385)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1233)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1214)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:136)
at 
org.apache.lucene.store.TestMultiMMap.assertChunking(TestMultiMMap.java:74)
at 
org.apache.lucene.store.TestMultiMMap.testRandomChunkSizes(TestMultiMMap.java:51)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)




Build Log (for compile errors):
[...truncated 3909 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046403#comment-13046403
 ] 

Simon Willnauer commented on LUCENE-2793:
-

Hey varun,

here are some more comments for the latest complete patch:

* We should have a static instance for IOContext with Context.Other which you 
can use in BitVector / CheckIndex for instance Maybe IOContext#DEFAULT_CONTEXT
* It seems that we don't need to provide IOContext to FieldInfos and 
SegmentInfo since we are reading them into memory anyway. I think you can just 
use a default context here without changing the constructors. Same is true for 
SegmentInfos 
* This is unrelated to your patch but in PreFlexFields we should use 
IndexFileNames.segmentFileName(info.name, , PreFlexCodec.FREQ_EXTENSION) and 
IndexFileNames.segmentFileName(info.name, , PreFlexCodec.PROX_EXTENSION) 
instead of info.name + .frq  and info.name + .prx
* it seems that we should communicate the IOContext to the codec somehow. I 
suggest we put IOContext to SegmentWriteState and SegmentReadState that way we 
don't need to change the Codec interface and clutter it with internals. This 
would also fix mikes comment for FieldsConsumer etc.
* TermVectorsWriter is only used in Merges so maybe it should also get a 
Context.Merge for consistency?
* I really don't like OneMerge :) I think we should add an abstract class  
(maybe MergeInfo) that exposes the estimatedMergeBytes, totalDocCount for now.
* small typo in RamDirectory, there is a space missing after the second file 
here: dir.copy(this, file, file,context);
* SegmentReader should also use the static Default IOContext - make sure its 
used where needed :)

Regarding the IOContext class I think we should design for what we have right 
now and since SegementInfo is not used anywhere (as far as I can see) we should 
add it once we need it. OneMerge should not go in there but rather the 
interface / abstract class I talked about above. 

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3180) Can't delete a document using deleteDocument(int docID) if using IndexWriter AND IndexReader

2011-06-09 Thread Danny Lade (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046415#comment-13046415
 ] 

Danny Lade commented on LUCENE-3180:


Hello Simon,

I will add an ID to my documents, that should do the trick.

Thanks for the help. 
:-)


 Can't delete a document using deleteDocument(int docID) if using IndexWriter 
 AND IndexReader
 

 Key: LUCENE-3180
 URL: https://issues.apache.org/jira/browse/LUCENE-3180
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.2
 Environment: Windows 
Reporter: Danny Lade
 Attachments: ImpossibleLuceneCode.java


 It is impossible to delete a document with reader.deleteDocument(docID) if 
 using an IndexWriter too.
 using:
 {code:java}
 writer = new IndexWriter(directory, config);
 reader = IndexReader.open(writer, true);
 {code}
 results in:
 {code:java}
   Exception in thread main java.lang.UnsupportedOperationException: This 
 IndexReader cannot make any changes to the index (it was opened with readOnly 
 = true)
   at 
 org.apache.lucene.index.ReadOnlySegmentReader.noWrite(ReadOnlySegmentReader.java:23)
   at 
 org.apache.lucene.index.ReadOnlyDirectoryReader.acquireWriteLock(ReadOnlyDirectoryReader.java:43)
   at 
 org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1067)
   at 
 de.morpheum.morphy.ImpossibleLuceneCode.main(ImpossibleLuceneCode.java:60)
 {code}
 and using:
 {code:java}
 writer = new IndexWriter(directory, config);
 reader = IndexReader.open(directory, false);
 {code}
   
 results in:
 {code:java}
   org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
 NativeFSLock@S:\Java\Morpheum\lucene\write.lock
   at org.apache.lucene.store.Lock.obtain(Lock.java:84)
   at 
 org.apache.lucene.index.DirectoryReader.acquireWriteLock(DirectoryReader.java:765)
   at 
 org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1067)
   at 
 de.morpheum.morphy.ImpossibleLuceneCode.main(ImpossibleLuceneCode.java:69)
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

DocValues type should be recored in FNX file to early fail if user specifies 
incompatible type
--

 Key: LUCENE-3186
 URL: https://issues.apache.org/jira/browse/LUCENE-3186
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


Currently segment merger fails if the docvalues type is not compatible across 
segments. We already catch this problem if somebody changes the values type for 
a field within one segment but not across segments. in order to do that we 
should record the type in the fnx fiel alone with the field numbers.

I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

Make external scoring more efficient (ExternalFileField, FileFloatSource)
-

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor


External scoring eats much memory, depending on the number of documents in the 
index. The ExternalFileField (used for external scoring) uses FileFloatSource, 
where one FileFloatSource is created per external scoring file. FileFloatSource 
creates a float array with the size of the number of docs (this is also done if 
the file to load is not found). If there are much less entries in the scoring 
file than there are number of docs in total the big float array wastes much 
memory.

This could be optimized by using a map of doc - score, so that the map 
contains as many entries as there are scoring entries in the external file, but 
not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: JCC usage failure

2011-06-09 Thread Petrus Hyvönen

Thank you for the support,

Finally I manage to get it working by reserving some words, and minimizing
the number of wrapped methods by just including those that I specifically
need:

python -m jcc --jar orekit-5.0.jar --include commons-math-2.2.jar --package
java.io --package org.apache.commons.math.geometry  --shared  --python
orekit --reserved INFINITE --reserved NO_DATA --reserved ERROR --install
--build

Is there a way to influence the docstrings generated (__doc__ function?), or
is there any way of converting from a javadoc to docstrings of the wrapped
library? :)

Thanks  Regards
/Petrus



On Fri, Jun 3, 2011 at 5:39 PM, Andi Vajda va...@apache.org wrote:


 On Jun 3, 2011, at 1:21, Petrus Hyvönen petrus.hyvo...@gmail.com wrote:

  Hi,
 
  I am trying to use JCC to wrap a java library (orekit.org), and have
  successfully done so on the mac platform. As I also use windows I try to
 do
  the same there.
 
  JCC compiles fine on both platforms (using --compiler=mingw32 on win,
 using
  the python xy distribution with mingw v4.5.2).
 
  the wrapper is successfully created on mac by
 
  on windows I needed to add the .__main__ for jcc:
 
  python -m jcc.__main__ --jar orekit-5.0.jar --jar commons-math-2.2.jar
  --include orekit-data.zip --shared  --python orekit --install --files
  separate --build
 
  the build goes on for some time and fails with extract below. Does anyone
  has some experience with this failure, and where does one start to solve
 it,
  is it the compiler, jcc? I have also tried with a fresh install with
 mingw32
  but no difference.
 
  Any help or directions appricated.
  /Petrus
 

 This is very likely to be caused by some variable name coming from your
 java sources that is defined as a macro by the header files coming from your
 compiler. To work this around add the variable name to the reserved word
 list by adding it to the jcc command line via the --reserved flag.
 To find which variable it is look at the error messages below and at the
 code they refer to. For example, Dfp.h, line 109 or Dfp.cpp, line 22.

 Andi..

  In file included from
 build\_orekit\org\apache\commons\math\dfp\Dfp.cpp:3:0:
  build\_orekit/org/apache/commons/math/dfp/Dfp.h:109:38: error: expected
  unqualified-id before numeric constant
  build\_orekit\org\apache\commons\math\dfp\Dfp.cpp:22:32: error: expected
  unqualified-id before numeric constant
  build\_orekit\org\apache\commons\math\dfp\Dfp.cpp: In static member
 function
  'static _jclass*
 org::apache::commons::math::dfp::Dfp::initializeClass()':
  build\_orekit\org\apache\commons\math\dfp\Dfp.cpp:100:79: error: lvalue
  required as left operand of assignment
  build\_orekit\org\apache\commons\math\dfp\Dfp.cpp: In static member
 function
  'static void
 org::apache::commons::math::dfp::t_Dfp::initialize(PyObject*)':
  build\_orekit\org\apache\commons\math\dfp\Dfp.cpp:476:101: error:
 expected
  unqualified-id before numeric constant
  error: command 'gcc' failed with exit status 1




-- 
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00

[jira] [Updated] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)


 [ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Grotzke updated SOLR-2583:
-

Attachment: FileFloatSource.java.patch

The attached patch changes FileFloatSource to use a map of score by doc.

 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: ant javadocs-test-framework failure

2011-06-09 Thread Uwe Schindler

Hi,

References to local fields must start with #.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Ryan McKinley [mailto:ryan...@gmail.com]
 Sent: Thursday, June 09, 2011 2:09 AM
 To: dev@lucene.apache.org
 Subject: ant javadocs-test-framework failure
 
 I'm getting a failure on:
 ant javadocs-test-framework
 
 {@link RANDOM_MULTIPLIER}  is failing -- i don't really get why...
 
 ryan@bicho~
 $ java -version
 java version 1.6.0_25
 Java(TM) SE Runtime Environment (build 1.6.0_25-b06) Java HotSpot(TM)
 Client VM (build 20.0-b11, mixed mode, sharing)
 
 ryan@bicho~
 $ ant -version
 Apache Ant version 1.7.1 compiled on June 27 2008
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Remove @version tags from JDocs

2011-06-09 Thread Simon Willnauer

hey folks,

in solr and some lucene classes we have @version tags with svn $Id
stuff in there which we got rid of in lucene a while ago. I went
through all classes and removed them. I just want to check with
everybody if its ok to commit that.
Note: I only changed javadocs all other usage of $Id etc still remains.

simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3108) Land DocValues on trunk


[ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046451#comment-13046451
 ] 

Michael McCandless commented on LUCENE-3108:


I did another review here -- I think it's ready to land on trunk!  Nice work 
Simon!

 Land DocValues on trunk
 ---

 Key: LUCENE-3108
 URL: https://issues.apache.org/jira/browse/LUCENE-3108
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search, core/store
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch


 Its time to move another feature from branch to trunk. I want to start this 
 process now while still a couple of issues remain on the branch. Currently I 
 am down to a single nocommit (javadocs on DocValues.java) and a couple of 
 testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
 but I think those are not worth separate issues so we can resolve them as we 
 go. 
 The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
 this process here IMO, we can fix them once we are on trunk. 
 Here is a quick feature overview of what has been implemented:
  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
 Bytes (fixed / variable size each in sorted, straight and deref variations)
  * Integration into Flex-API, Codec provides a 
 PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) 
  * By-Default enabled in all codecs except of PreFlex
  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
 MultiPerDocValues if on DirReader etc.
  * Integration into IndexWriter, FieldInfos etc.
  * Random-testing enabled via RandomIW - injecting random DocValues into 
 documents
  * Basic checks in CheckIndex (which runs after each test)
  * FieldComparator for int and float variants (Sorting, currently directly 
 integrated into SortField, this might go into a separate DocValuesSortField 
 eventually)
  * Extended TestSort for DocValues
  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
 sequential access) - Source.java / DocValuesEnum.java
  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
 loaded into RAM only once and freed once IR is closed) - SourceCache.java
  
 PS: Currently the RAM resident API is named Source (Source.java) which seems 
 too generic. I think we should rename it into RamDocValues or something like 
 that, suggestion welcome!   
 Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Remove @version tags from JDocs

2011-06-09 Thread Uwe Schindler

+1

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
 Sent: Thursday, June 09, 2011 12:25 PM
 To: dev@lucene.apache.org
 Subject: Remove @version tags from JDocs
 
 hey folks,
 
 in solr and some lucene classes we have @version tags with svn $Id stuff in
 there which we got rid of in lucene a while ago. I went through all classes 
 and
 removed them. I just want to check with everybody if its ok to commit that.
 Note: I only changed javadocs all other usage of $Id etc still remains.
 
 simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2935) Let Codec consume entire document

[
https://issues.apache.org/jira/browse/LUCENE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer resolved LUCENE-2935.
-

Resolution: Fixed

the main infrastructure has been committed to the docvalues branch - moving out
here

Let Codec consume entire document
-

Key: LUCENE-2935
URL: https://issues.apache.org/jira/browse/LUCENE-2935
Project: Lucene - Java
Issue Type: Improvement
Components: core/codecs, core/index
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Fix For: CSF branch, 4.0

Currently the codec API is limited to consume Terms Postings upon a segment
flush. To enable stored fields DocValues to make use of the Codec
abstraction codecs should allow to pull a consumer ahead of flush time and
consume all values from a document's field though a consumer API. An
alternative to consuming the entire document would be extending
FieldsConsumer to return a StoredValueConsumer / DocValuesConsumer like it is
done in DocValues - Branch right now side by side to the TermsConsumer. Yet,
extending this has proven to be very tricky and error prone for several
reasons:
* FieldsConsumer requires SegmentWriteState which might be different upon
flush compared to when the document is consumed. SegmentWriteState must
therefor be created twice 1. when the first docvalues field is indexed 2.
when flushed.
* FieldsConsumer are current pulled for each indexed field no matter if there
are terms to be indexed or not. Yet, if we use something like DocValuesCodec
which essentially wraps another codec and creates FieldConsumer on demand the
wrapped codecs consumer might not be initialized even if the field is
indexed. This causes problems once such a field is opened but missing the
required files for that codec. I added some harsh logic to work around this
which should be prevented.
* SegmentCodecs are created for each SegmentWriteState which might yield
wrong codec IDs depending on how fields numbers are assigned. We currently
depend on the fact that all fields for a segment and therefore their codecs
are known when SegmentCodecs are build. To enable consuming perDoc values in
codecs we need to do that incrementally
Codecs should instead provide a DocumentConsumer side by side with the
FieldsConsumer created prior to flush. This is also a prerequisite for
LUCENE-2621

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3075) DocValues should be optionally be stored in a PerCodec CFS file to prevent too many files in the index


 [ 
https://issues.apache.org/jira/browse/LUCENE-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3075:


Affects Version/s: (was: CSF branch)
   4.0
Fix Version/s: (was: CSF branch)
   4.0

update to 4.0 - fix once on trunk

 DocValues should be optionally be stored in a PerCodec CFS file to prevent 
 too many files in the index
 --

 Key: LUCENE-3075
 URL: https://issues.apache.org/jira/browse/LUCENE-3075
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 Currently docvalues create one file per field to store the docvalues. Yet 
 this could easily lead to too many open files so me might need to enable CFS 
 per codec to keep the number of files reasonable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3074) SimpleTextCodec needs SimpleText DocValues impl


 [ 
https://issues.apache.org/jira/browse/LUCENE-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3074:


Affects Version/s: (was: CSF branch)
   4.0
Fix Version/s: (was: CSF branch)
   4.0

fix once on trunk

 SimpleTextCodec needs SimpleText DocValues impl
 ---

 Key: LUCENE-3074
 URL: https://issues.apache.org/jira/browse/LUCENE-3074
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Michael McCandless
 Fix For: 4.0


 currently SimpleTextCodec uses binary docValues we should move that to a 
 simple text impl.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2186) First cut at column-stride fields (index values storage)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2186.
-

Resolution: Fixed

currently landing on LUCENE-3108

 First cut at column-stride fields (index values storage)
 

 Key: LUCENE-2186
 URL: https://issues.apache.org/jira/browse/LUCENE-2186
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0

 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, 
 LUCENE-2186.patch, LUCENE-2186.patch, mem.py


 I created an initial basic impl for storing index values (ie
 column-stride value storage).  This is still a work in progress... but
 the approach looks compelling.  I'm posting my current status/patch
 here to get feedback/iterate, etc.
 The code is standalone now, and lives under new package
 oal.index.values (plus some util changes, refactorings) -- I have yet
 to integrate into Lucene so eg you can mark that a given Field's value
 should be stored into the index values, sorting will use these values
 instead of field cache, etc.
 It handles 3 types of values:
   * Six variants of byte[] per doc, all combinations of fixed vs
 variable length, and stored either straight (good for eg a
 title field), deref (good when many docs share the same value,
 but you won't do any sorting) or sorted.
   * Integers (variable bit precision used as necessary, ie this can
 store byte/short/int/long, and all precisions in between)
   * Floats (4 or 8 byte precision)
 String fields are stored as the UTF8 byte[].  This patch adds a
 BytesRef, which does the same thing as flex's TermRef (we should merge
 them).
 This patch also adds basic initial impl of PackedInts (LUCENE-1990);
 we can swap that out if/when we get a better impl.
 This storage is dense (like field cache), so it's appropriate when the
 field occurs in all/most docs.  It's just like field cache, except the
 reading API is a get() method invocation, per document.
 Next step is to do basic integration with Lucene, and then compare
 sort performance of this vs field cache.
 For the sort by String value case, I think RAM usage  GC load of
 this index values API should be much better than field caache, since
 it does not create object per document (instead shares big long[] and
 byte[] across all docs), and because the values are stored in RAM as
 their UTF8 bytes.
 There are abstract Writer/Reader classes.  The current reader impls
 are entirely RAM resident (like field cache), but the API is (I think)
 agnostic, ie, one could make an MMAP impl instead.
 I think this is the first baby step towards LUCENE-1231.  Ie, it
 cannot yet update values, and the reading API is fully random-access
 by docID (like field cache), not like a posting list, though I
 do think we should add an iterator() api (to return flex's DocsEnum)
 -- eg I think this would be a good way to track avg doc/field length
 for BM25/lnu.ltc scoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-1231) Column-stride fields (aka per-document Payloads)

[
https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer reassigned LUCENE-1231:
---

Assignee: Simon Willnauer (was: Michael Busch)

Column-stride fields (aka per-document Payloads)

Key: LUCENE-1231
URL: https://issues.apache.org/jira/browse/LUCENE-1231
Project: Lucene - Java
Issue Type: New Feature
Components: core/index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
Fix For: 4.0

This new feature has been proposed and discussed here:
http://markmail.org/search/?q=per-document+payloads#query:per-document%20payloads+page:1+mid:jq4g5myhlvidw3oc+state:results
Currently it is possible in Lucene to store data as stored fields or as
payloads.
Stored fields provide good performance if you want to load all fields for one
document, because this is an sequential I/O operation.
If you however want to load the data from one field for a large number of
documents, then stored fields perform quite badly, because lot's of I/O seeks
might have to be performed.
A better way to do this is using payloads. By creating a special posting
list
that has one posting with payload for each document you can simulate a
column-
stride field. The performance is significantly better compared to stored
fields,
however still not optimal. The reason is that for each document the freq
value,
which is in this particular case always 1, has to be decoded, also one
position
value, which is always 0, has to be loaded.
As a solution we want to add real column-stride fields to Lucene. A possible
format for the new data structure could look like this (CSD stands for column-
stride data, once we decide for a final name for this feature we can change
this):
CSDList -- FixedLengthList | VariableLengthList, SkipList
FixedLengthList -- Payload^SegSize
VariableLengthList -- DocDelta, PayloadLength?, Payload
Payload -- Byte^PayloadLength
PayloadLength -- VInt
SkipList -- see frq.file
We distinguish here between the fixed length and the variable length cases. To
allow flexibility, Lucene could automatically pick the right data
structure.
This could work like this: When the DocumentsWriter writes a segment it
checks
whether all values of a field have the same length. If yes, it stores them as
FixedLengthList, if not, then as VariableLengthList. When the SegmentMerger
merges two or more segments it checks if all segments have a FixedLengthList
with the same length for a column-stride field. If not, it writes a
VariableLengthList to the new segment.
Once this feature is implemented, we should think about making the column-
stride fields updateable, similar to the norms. This will be a very powerful
feature that can for example be used for low-latency tagging of documents.
Other use cases:
- replace norms
- allow to store boost values separately from norms
- as input for the FieldCache, thus providing significantly improved loading
performance (see LUCENE-831)
Things that need to be done here:
- decide for a name for this feature :) - I think column-stride fields was
liked better than per-document payloads
- Design an API for this feature. We should keep in mind here that these
fields are supposed to be updateable.
- Define datastructures.
I would like to get this feature into 2.4. Feedback about the open questions
is very welcome so that we can finalize the design soon and start
implementing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-1231) Column-stride fields (aka per-document Payloads)

[
https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer resolved LUCENE-1231.
-

Resolution: Duplicate

this has been implemented in LUCENE-3108, LUCENE-2935, LUCENE-2168 and
LUCENE-1231 moving out

Column-stride fields (aka per-document Payloads)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2463) Using an evaluator outside the scope of an entity results in a null context

2011-06-09 Thread Frank Wesemann


Jeffrey,
can you supply some more information like data-config.xml, stacktrace 
and what your delta-query looks like?


[ https://issues.apache.org/jira/browse/SOLR-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046346#comment-13046346 ] 


Jeffrey Chang commented on SOLR-2463:
-

I just tried delta-imports on 3.2, this is still unresolved. I also tried applying SOLR-2186 patch but no luck. 
  



--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

[
https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046467#comment-13046467
]

Robert Muir commented on LUCENE-3186:
-

do we really need to do this? I guess also looking at LUCENE-3187, I think I'm
against this trend.

Shall we put analyzer classnames in there too? If we are going to put docvalues
type and precision step, well then i want the stopwords file in the fnx file
too!

At some point, if a user is going to shoot themselves in the foot, we simply
cannot stop them, and I don't think its our job to.

DocValues type should be recored in FNX file to early fail if user specifies
incompatible type
--

Key: LUCENE-3186
URL: https://issues.apache.org/jira/browse/LUCENE-3186
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Fix For: 4.0

Currently segment merger fails if the docvalues type is not compatible across
segments. We already catch this problem if somebody changes the values type
for a field within one segment but not across segments. in order to do that
we should record the type in the fnx fiel alone with the field numbers.
I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2372) Upgrade Solr to Tika 0.9

2011-06-09 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2372:
--

  Component/s: contrib - Solr Cell (Tika extraction)
 Priority: Major  (was: Trivial)
Fix Version/s: 3.3

Marking for 3.3 and bumping priority to major due to the good cost/benefit 
ratio, especially for PDF parsing.
I'd love to contribute but I think this kind of change cannot be done with a 
patch.

 Upgrade Solr to Tika 0.9
 

 Key: SOLR-2372
 URL: https://issues.apache.org/jira/browse/SOLR-2372
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Grant Ingersoll
 Fix For: 3.3


 as the title says

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3108) Land DocValues on trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3108:


Attachment: LUCENE-3108_CHANGES.patch

here is a changes entry for docvalues - comments welcome

 Land DocValues on trunk
 ---

 Key: LUCENE-3108
 URL: https://issues.apache.org/jira/browse/LUCENE-3108
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search, core/store
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, 
 LUCENE-3108_CHANGES.patch


 Its time to move another feature from branch to trunk. I want to start this 
 process now while still a couple of issues remain on the branch. Currently I 
 am down to a single nocommit (javadocs on DocValues.java) and a couple of 
 testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
 but I think those are not worth separate issues so we can resolve them as we 
 go. 
 The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
 this process here IMO, we can fix them once we are on trunk. 
 Here is a quick feature overview of what has been implemented:
  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
 Bytes (fixed / variable size each in sorted, straight and deref variations)
  * Integration into Flex-API, Codec provides a 
 PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) 
  * By-Default enabled in all codecs except of PreFlex
  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
 MultiPerDocValues if on DirReader etc.
  * Integration into IndexWriter, FieldInfos etc.
  * Random-testing enabled via RandomIW - injecting random DocValues into 
 documents
  * Basic checks in CheckIndex (which runs after each test)
  * FieldComparator for int and float variants (Sorting, currently directly 
 integrated into SortField, this might go into a separate DocValuesSortField 
 eventually)
  * Extended TestSort for DocValues
  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
 sequential access) - Source.java / DocValuesEnum.java
  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
 loaded into RAM only once and freed once IR is closed) - SourceCache.java
  
 PS: Currently the RAM resident API is named Source (Source.java) which seems 
 too generic. I think we should rename it into RamDocValues or something like 
 that, suggestion welcome!   
 Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3108) Land DocValues on trunk

2011-06-09 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046473#comment-13046473
 ] 

Uwe Schindler commented on LUCENE-3108:
---

One small issue:

There seems to be a merge missing in file TestIndexSplitter, the changes in 
there are unrelated, so this reverts a commit on trunk for improving tests.

The problem with the README.txt is already fixed.

...still digging

 Land DocValues on trunk
 ---

 Key: LUCENE-3108
 URL: https://issues.apache.org/jira/browse/LUCENE-3108
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search, core/store
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, 
 LUCENE-3108_CHANGES.patch


 Its time to move another feature from branch to trunk. I want to start this 
 process now while still a couple of issues remain on the branch. Currently I 
 am down to a single nocommit (javadocs on DocValues.java) and a couple of 
 testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
 but I think those are not worth separate issues so we can resolve them as we 
 go. 
 The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
 this process here IMO, we can fix them once we are on trunk. 
 Here is a quick feature overview of what has been implemented:
  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
 Bytes (fixed / variable size each in sorted, straight and deref variations)
  * Integration into Flex-API, Codec provides a 
 PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) 
  * By-Default enabled in all codecs except of PreFlex
  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
 MultiPerDocValues if on DirReader etc.
  * Integration into IndexWriter, FieldInfos etc.
  * Random-testing enabled via RandomIW - injecting random DocValues into 
 documents
  * Basic checks in CheckIndex (which runs after each test)
  * FieldComparator for int and float variants (Sorting, currently directly 
 integrated into SortField, this might go into a separate DocValuesSortField 
 eventually)
  * Extended TestSort for DocValues
  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
 sequential access) - Source.java / DocValuesEnum.java
  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
 loaded into RAM only once and freed once IR is closed) - SourceCache.java
  
 PS: Currently the RAM resident API is named Source (Source.java) which seems 
 too generic. I think we should rename it into RamDocValues or something like 
 that, suggestion welcome!   
 Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3108) Land DocValues on trunk


[ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046474#comment-13046474
 ] 

Simon Willnauer commented on LUCENE-3108:
-

bq. There seems to be a merge missing in file TestIndexSplitter, the changes in 
there are unrelated, so this reverts a commit on trunk for improving tests.
fixed revision 1133794

thanks uwe!

 Land DocValues on trunk
 ---

 Key: LUCENE-3108
 URL: https://issues.apache.org/jira/browse/LUCENE-3108
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search, core/store
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, 
 LUCENE-3108_CHANGES.patch


 Its time to move another feature from branch to trunk. I want to start this 
 process now while still a couple of issues remain on the branch. Currently I 
 am down to a single nocommit (javadocs on DocValues.java) and a couple of 
 testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
 but I think those are not worth separate issues so we can resolve them as we 
 go. 
 The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
 this process here IMO, we can fix them once we are on trunk. 
 Here is a quick feature overview of what has been implemented:
  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
 Bytes (fixed / variable size each in sorted, straight and deref variations)
  * Integration into Flex-API, Codec provides a 
 PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) 
  * By-Default enabled in all codecs except of PreFlex
  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
 MultiPerDocValues if on DirReader etc.
  * Integration into IndexWriter, FieldInfos etc.
  * Random-testing enabled via RandomIW - injecting random DocValues into 
 documents
  * Basic checks in CheckIndex (which runs after each test)
  * FieldComparator for int and float variants (Sorting, currently directly 
 integrated into SortField, this might go into a separate DocValuesSortField 
 eventually)
  * Extended TestSort for DocValues
  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
 sequential access) - Source.java / DocValuesEnum.java
  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
 loaded into RAM only once and freed once IR is closed) - SourceCache.java
  
 PS: Currently the RAM resident API is named Source (Source.java) which seems 
 too generic. I think we should rename it into RamDocValues or something like 
 that, suggestion welcome!   
 Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #147: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/147/

No tests ran.

Build Log (for compile errors):
[...truncated 8340 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Remove @version tags from JDocs

+1
On Thu, Jun 9, 2011 at 6:24 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 hey folks,

 in solr and some lucene classes we have @version tags with svn $Id
 stuff in there which we got rid of in lucene a while ago. I went
 through all classes and removed them. I just want to check with
 everybody if its ok to commit that.
 Note: I only changed javadocs all other usage of $Id etc still remains.

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

2011-06-09 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046479#comment-13046479
 ] 

Uwe Schindler commented on LUCENE-3186:
---

Hi Robert,

I am also not really happy with this trend. I just opened LUCENE-3187 to start 
a discussion. In my opinion we should improve documentation instead.

 DocValues type should be recored in FNX file to early fail if user specifies 
 incompatible type
 --

 Key: LUCENE-3186
 URL: https://issues.apache.org/jira/browse/LUCENE-3186
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 Currently segment merger fails if the docvalues type is not compatible across 
 segments. We already catch this problem if somebody changes the values type 
 for a field within one segment but not across segments. in order to do that 
 we should record the type in the fnx fiel alone with the field numbers.
 I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3187) Store NumericField precisionStep in fnx file

2011-06-09 Thread Uwe Schindler (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046480#comment-13046480
]

Uwe Schindler commented on LUCENE-3187:
---

Robert commented on LUCENE-3186:

{quote}
do we really need to do this? I guess also looking at LUCENE-3187, I think I'm
against this trend.

Shall we put analyzer classnames in there too? If we are going to put docvalues
type and precision step, well then i want the stopwords file in the fnx file
too!

At some point, if a user is going to shoot themselves in the foot, we simply
cannot stop them, and I don't think its our job to.
{quote}

I am also not really happy with this trend. I just opened LUCENE-3187 to start
a discussion. In my opinion we should improve documentation instead.

Store NumericField precisionStep in fnx file

Key: LUCENE-3187
URL: https://issues.apache.org/jira/browse/LUCENE-3187
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Affects Versions: 2.9, 3.0, 3.1, 3.2
Reporter: Uwe Schindler

This is a similar problem like LUCENE-3186:
The following question was sent to user list:
[http://mail-archives.apache.org/mod_mbox/lucene-java-user/201106.mbox/%3c614c529d389a5944b351f7dfb7594f24012aa...@uksrpblkexb01.detica.com%3E]
The main problem is that you have to pass the precision step and must knwo
the field type of numeric fields before doing a query, else you get wrong
results. We can maybe store the type and precision step in fnx file (like we
do for stored numeric fields in FieldsWriter).
I am not sure whats the best way to do it (without too much code
specialization), but it seems a good idea. On the other hand, we don't store
references to the Analyzer in the fnx file, so why for numeric field (it's
just like an analyzer - if you change it, results are wrong)?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type


[ 
https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046481#comment-13046481
 ] 

Simon Willnauer commented on LUCENE-3186:
-

I think for this issue we can compute that info at IW open time. we can simply 
run through the FIs and prepopulate the info. I think this is better than 
redundantly store this info.

 DocValues type should be recored in FNX file to early fail if user specifies 
 incompatible type
 --

 Key: LUCENE-3186
 URL: https://issues.apache.org/jira/browse/LUCENE-3186
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 Currently segment merger fails if the docvalues type is not compatible across 
 segments. We already catch this problem if somebody changes the values type 
 for a field within one segment but not across segments. in order to do that 
 we should record the type in the fnx fiel alone with the field numbers.
 I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2955) Add utitily class to manage NRT reopening

[
https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046496#comment-13046496
]

Simon Willnauer commented on LUCENE-2955:
-

Mike, nice work so far :) I have to admit that I really don't like the reopen
thread. I think reopen in the background should be abstracted and the Reopen
thread should not be part of the core manager. By default I think we should
consult a ReopenStrategy on change and hijack indexing threads to reopen the
reader. we can still sychronized the reopeing with a lock.tryLock() and by
default go with a timed reopen policy. Thoughts?

simon

Add utitily class to manage NRT reopening
-

Key: LUCENE-2955
URL: https://issues.apache.org/jira/browse/LUCENE-2955
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 3.3

Attachments: LUCENE-2955.patch, LUCENE-2955.patch

I created a simple class, NRTManager, that tries to abstract away some
of the reopen logic when using NRT readers.
You give it your IW, tell it min and max nanoseconds staleness you can
tolerate, and it privately runs a reopen thread to periodically reopen
the searcher.
It subsumes the SearcherManager from LIA2. Besides running the reopen
thread, it also adds the notion of a generation containing changes
you've made. So eg it has addDocument, returning a long. You can
then take that long value and pass it back to the getSearcher method
and getSearcher will return a searcher that reflects the changes made
in that generation.
This gives your app the freedom to force immediate consistency (ie
wait for the reopen) only for those searches that require it, like a
verifier that adds a doc and then immediately searches for it, but
also use eventual consistency for other searches.
I want to also add support for the new applyDeletions option when
pulling an NRT reader.
Also, this is very new and I'm sure buggy -- the concurrency is either
wrong over overly-locking. But it's a start...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3108) Land DocValues on trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3108:


Attachment: LUCENE-3108.patch

here is the latest diff for docvalues I will now reintegrate the branch and 
post diffs later.

 Land DocValues on trunk
 ---

 Key: LUCENE-3108
 URL: https://issues.apache.org/jira/browse/LUCENE-3108
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search, core/store
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, 
 LUCENE-3108.patch, LUCENE-3108_CHANGES.patch


 Its time to move another feature from branch to trunk. I want to start this 
 process now while still a couple of issues remain on the branch. Currently I 
 am down to a single nocommit (javadocs on DocValues.java) and a couple of 
 testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
 but I think those are not worth separate issues so we can resolve them as we 
 go. 
 The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
 this process here IMO, we can fix them once we are on trunk. 
 Here is a quick feature overview of what has been implemented:
  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
 Bytes (fixed / variable size each in sorted, straight and deref variations)
  * Integration into Flex-API, Codec provides a 
 PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) 
  * By-Default enabled in all codecs except of PreFlex
  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
 MultiPerDocValues if on DirReader etc.
  * Integration into IndexWriter, FieldInfos etc.
  * Random-testing enabled via RandomIW - injecting random DocValues into 
 documents
  * Basic checks in CheckIndex (which runs after each test)
  * FieldComparator for int and float variants (Sorting, currently directly 
 integrated into SortField, this might go into a separate DocValuesSortField 
 eventually)
  * Extended TestSort for DocValues
  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
 sequential access) - Source.java / DocValuesEnum.java
  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
 loaded into RAM only once and freed once IR is closed) - SourceCache.java
  
 PS: Currently the RAM resident API is named Source (Source.java) which seems 
 too generic. I think we should rename it into RamDocValues or something like 
 that, suggestion welcome!   
 Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3183) TestIndexWriter failure: AIOOBE


 [ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3183:


Attachment: LUCENE-3183_test.patch

i tried to debug this a little last night... its some off-by-one in reset() 
(this shoves a negative ord into the terms dictionary cache, which jacks things 
up later)

test passes on 3.x, also generated 3.x index and checkindex'd it with trunk to 
verify that the problem isn't in Preflex-RW but is actually in PreFlex-R... but 
I didn't manage to come up with any non-hacky solution for the off-by-one...

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
 Attachments: LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at

[jira] [Commented] (SOLR-2580) Create a new Search Component to alter queries based on business rules.

2011-06-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046514#comment-13046514
 ] 

Tomás Fernández Löbbe commented on SOLR-2580:
-

Basically, it's just another component designed to modify the relevancy of 
documents, as the QueryElevationComponent is. Of course, this could be 
implemented by each site on the application layer but I think it would be very 
helpful to write one reusable component, then everybody can use the same, they 
don't reinvent the wheel and they can invest the effort in improving it.
Should it be included in Solr? Personally I think this is something that can be 
useful to many people and it will add value to Solr. At the end, the community 
and the committers will decide if they think this is something worthily or not.

JBoss AS is the application server, but JBoss is also an organization that runs 
many projects (like drools). You don't need to use any application server in 
particular to make Drools work. It's a library, not an application itself.


 Create a new Search Component to alter queries based on business rules. 
 

 Key: SOLR-2580
 URL: https://issues.apache.org/jira/browse/SOLR-2580
 Project: Solr
  Issue Type: New Feature
Reporter: Tomás Fernández Löbbe

 The goal is to be able to adjust the relevance of documents based on user 
 defined business rules.
 For example, in a e-commerce site, when the user chooses the shoes 
 category, we may be interested in boosting products from a certain brand. 
 This can be expressed as a rule in the following way:
 rule Boost Adidas products when searching shoes
 when
 $qt : QueryTool()
 TermQuery(term.field==category, term.text==shoes)
 then
 $qt.boost({!lucene}brand:adidas);
 end
 The QueryTool object should be used to alter the main query in a easy way. 
 Even more human-like rules can be written:
 rule Boost Adidas products when searching shoes
  when
 Query has term shoes in field product
  then
 Add boost query {!lucene}brand:adidas
 end
 These rules are written in a text file in the config directory and can be 
 modified at runtime. Rules will be managed using JBoss Drools: 
 http://www.jboss.org/drools/drools-expert.html
 On a first stage, it will allow to add boost queries or change sorting fields 
 based on the user query, but it could be extended to allow more options.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2955) Add utitily class to manage NRT reopening

2011-06-09 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046518#comment-13046518
 ] 

Chris Male commented on LUCENE-2955:


I agree with Simon.  I think providing a ReopenStrategy abstraction will be 
helpful.  

 Add utitily class to manage NRT reopening
 -

 Key: LUCENE-2955
 URL: https://issues.apache.org/jira/browse/LUCENE-2955
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3

 Attachments: LUCENE-2955.patch, LUCENE-2955.patch


 I created a simple class, NRTManager, that tries to abstract away some
 of the reopen logic when using NRT readers.
 You give it your IW, tell it min and max nanoseconds staleness you can
 tolerate, and it privately runs a reopen thread to periodically reopen
 the searcher.
 It subsumes the SearcherManager from LIA2.  Besides running the reopen
 thread, it also adds the notion of a generation containing changes
 you've made.  So eg it has addDocument, returning a long.  You can
 then take that long value and pass it back to the getSearcher method
 and getSearcher will return a searcher that reflects the changes made
 in that generation.
 This gives your app the freedom to force immediate consistency (ie
 wait for the reopen) only for those searches that require it, like a
 verifier that adds a doc and then immediately searches for it, but
 also use eventual consistency for other searches.
 I want to also add support for the new applyDeletions option when
 pulling an NRT reader.
 Also, this is very new and I'm sure buggy -- the concurrency is either
 wrong over overly-locking.  But it's a start...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3111) TestFSTs.testRandomWords failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046522#comment-13046522
 ] 

Robert Muir commented on LUCENE-3111:
-

{quote}
I can reproduce - Gabriele's test class's setUp() method calls super.setUp(), 
but when I run the test the error message about needing to call super.setUp() 
is emitted, and the test fails. I don't know how to diagnose this problem, 
though.
{quote}

You must use junit 4.7 (not 4.8).
In junit 4.8 TestWatchMan.starting() is fired before the @Befores, but not in 
4.7 (This behavior annoyed me in 4.7 by the way).

I definitely don't mind opening a new issue to switch to 4.8 as a minimum 
requirement.


 TestFSTs.testRandomWords failure
 

 Key: LUCENE-3111
 URL: https://issues.apache.org/jira/browse/LUCENE-3111
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3111.patch


 Was running some while(1) tests on the docvalues branch (r1103705) and the 
 following test failed:
 {code}
 [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
 [junit] Testcase: 
 testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
 [junit] expected:771 but was:TwoLongs:771,771
 [junit] junit.framework.AssertionFailedError: expected:771 but 
 was:TwoLongs:771,771
 [junit]   at 
 org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
 [junit]   at 
 org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
 [junit]   at 
 org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
 [junit]   at 
 org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
 [junit]   at 
 org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
 [junit]   at 
 org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
 [junit] 
 [junit] 
 [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
 [junit] 
 [junit] - Standard Error -
 [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
 -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
 [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
 timezone=America/Blanc-Sablon
 [junit] NOTE: all tests run in this JVM:
 [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
 TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
 TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
 TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
 TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
 TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
 TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
 TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
 TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
 TestUnicodeUtil, TestFSTs]
 [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
 (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
 [junit] -  ---
 [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
 {code}
 I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3108) Land DocValues on trunk


 [ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3108:


Attachment: LUCENE-3108.patch

Patch that reflects the last changes to sync with trunk after I ran svn merge 
-reintegrate 
The reintegrated branch looks good, no unchanged additions etc. 

I think we are ready to land this on trunk... I will wait a day or two if 
somebody has objections. 

here is my +1 to commit

 Land DocValues on trunk
 ---

 Key: LUCENE-3108
 URL: https://issues.apache.org/jira/browse/LUCENE-3108
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search, core/store
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, 
 LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch


 Its time to move another feature from branch to trunk. I want to start this 
 process now while still a couple of issues remain on the branch. Currently I 
 am down to a single nocommit (javadocs on DocValues.java) and a couple of 
 testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
 but I think those are not worth separate issues so we can resolve them as we 
 go. 
 The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
 this process here IMO, we can fix them once we are on trunk. 
 Here is a quick feature overview of what has been implemented:
  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
 Bytes (fixed / variable size each in sorted, straight and deref variations)
  * Integration into Flex-API, Codec provides a 
 PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) 
  * By-Default enabled in all codecs except of PreFlex
  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
 MultiPerDocValues if on DirReader etc.
  * Integration into IndexWriter, FieldInfos etc.
  * Random-testing enabled via RandomIW - injecting random DocValues into 
 documents
  * Basic checks in CheckIndex (which runs after each test)
  * FieldComparator for int and float variants (Sorting, currently directly 
 integrated into SortField, this might go into a separate DocValuesSortField 
 eventually)
  * Extended TestSort for DocValues
  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
 sequential access) - Source.java / DocValuesEnum.java
  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
 loaded into RAM only once and freed once IR is closed) - SourceCache.java
  
 PS: Currently the RAM resident API is named Source (Source.java) which seems 
 too generic. I think we should rename it into RamDocValues or something like 
 that, suggestion welcome!   
 Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2582) UIMAUpdateRequestProcessor error handling with small texts

2011-06-09 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046532#comment-13046532
 ] 

Koji Sekiguchi commented on SOLR-2582:
--

Duplicate of SOLR-2579 ?

 UIMAUpdateRequestProcessor error handling with small texts
 --

 Key: SOLR-2582
 URL: https://issues.apache.org/jira/browse/SOLR-2582
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.2
Reporter: Tommaso Teofili
 Fix For: 3.3


 In UIMAUpdateRequestProcessor the catch block in processAdd() method can have 
 a StringIndexOutOfBoundsException while composing the error message if the 
 logging field is not set and the text being processed is shorter than 100 
 chars (...append(text.substring(0, 100))...).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2582) UIMAUpdateRequestProcessor error handling with small texts

2011-06-09 Thread Elmer Garduno (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046534#comment-13046534
 ] 

Elmer Garduno commented on SOLR-2582:
-

Sorry it seems to me as a duplicate but I see its a different problem. I've 
removed the link.

 UIMAUpdateRequestProcessor error handling with small texts
 --

 Key: SOLR-2582
 URL: https://issues.apache.org/jira/browse/SOLR-2582
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.2
Reporter: Tommaso Teofili
 Fix For: 3.3


 In UIMAUpdateRequestProcessor the catch block in processAdd() method can have 
 a StringIndexOutOfBoundsException while composing the error message if the 
 logging field is not set and the text being processed is shorter than 100 
 chars (...append(text.substring(0, 100))...).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3108) Land DocValues on trunk

2011-06-09 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046537#comment-13046537
 ] 

Ryan McKinley commented on LUCENE-3108:
---

+1   This looks great.  

To avoid more svn work, I think committing soon is better then later.

 Land DocValues on trunk
 ---

 Key: LUCENE-3108
 URL: https://issues.apache.org/jira/browse/LUCENE-3108
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index, core/search, core/store
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, 
 LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch


 Its time to move another feature from branch to trunk. I want to start this 
 process now while still a couple of issues remain on the branch. Currently I 
 am down to a single nocommit (javadocs on DocValues.java) and a couple of 
 testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
 but I think those are not worth separate issues so we can resolve them as we 
 go. 
 The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
 this process here IMO, we can fix them once we are on trunk. 
 Here is a quick feature overview of what has been implemented:
  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
 Bytes (fixed / variable size each in sorted, straight and deref variations)
  * Integration into Flex-API, Codec provides a 
 PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) 
  * By-Default enabled in all codecs except of PreFlex
  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
 MultiPerDocValues if on DirReader etc.
  * Integration into IndexWriter, FieldInfos etc.
  * Random-testing enabled via RandomIW - injecting random DocValues into 
 documents
  * Basic checks in CheckIndex (which runs after each test)
  * FieldComparator for int and float variants (Sorting, currently directly 
 integrated into SortField, this might go into a separate DocValuesSortField 
 eventually)
  * Extended TestSort for DocValues
  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
 sequential access) - Source.java / DocValuesEnum.java
  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
 loaded into RAM only once and freed once IR is closed) - SourceCache.java
  
 PS: Currently the RAM resident API is named Source (Source.java) which seems 
 too generic. I think we should rename it into RamDocValues or something like 
 that, suggestion welcome!   
 Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2582) UIMAUpdateRequestProcessor error handling with small texts

2011-06-09 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046544#comment-13046544
 ] 

Tommaso Teofili commented on SOLR-2582:
---

I think they're related but the approach proposed here is slightly different 
since considers the uniquekey instead of the text analyzed as the alternative 
to the logField. Maybe the best solution is applying the patch in SOLR-2579 and 
then make the error message more useful with other debugging informations.

 UIMAUpdateRequestProcessor error handling with small texts
 --

 Key: SOLR-2582
 URL: https://issues.apache.org/jira/browse/SOLR-2582
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.2
Reporter: Tommaso Teofili
 Fix For: 3.3


 In UIMAUpdateRequestProcessor the catch block in processAdd() method can have 
 a StringIndexOutOfBoundsException while composing the error message if the 
 logging field is not set and the text being processed is shorter than 100 
 chars (...append(text.substring(0, 100))...).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1804) Upgrade Carrot2 to 3.2.0

2011-06-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046545#comment-13046545
 ] 

David Smiley commented on SOLR-1804:


Good point Rob. If any use of Guava in a patch to Solr core is going to get 
reverted, then we might as well recognize that now and move Guava from Solr's 
lib to clustering's lib directory.

 Upgrade Carrot2 to 3.2.0
 

 Key: SOLR-1804
 URL: https://issues.apache.org/jira/browse/SOLR-1804
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Clustering
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Fix For: 3.1, 4.0

 Attachments: SOLR-1804-carrot2-3.4.0-dev-trunk.patch, 
 SOLR-1804-carrot2-3.4.0-dev.patch, SOLR-1804-carrot2-3.4.0-libs.zip, 
 SOLR-1804.patch, carrot2-core-3.4.0-jdk1.5.jar


 http://project.carrot2.org/release-3.2.0-notes.html
 Carrot2 is now LGPL free, which means we should be able to bundle the binary!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-09 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046564#comment-13046564
 ] 

Yonik Seeley commented on SOLR-2583:


Yeah, this will help for sparse fields, but hurt quite a bit for non-sparse 
ones.
Seems like we should make it an option (sparse=true/false on the fieldType 
definition)?

 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #144: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/144/

No tests ran.

Build Log (for compile errors):
[...truncated 7478 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3183) TestIndexWriter failure: AIOOBE


 [ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3183:
---

Attachment: LUCENE-3183.patch

Patch.

Turns out this is a long standing corner-case bug... the problem only happens 
if you seek to the empty term (field= and text=), and you use 
termsIndexInterval=1.

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
 Attachments: LUCENE-3183.patch, LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit] at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 [junit] at

[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE


[ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046597#comment-13046597
 ] 

Robert Muir commented on LUCENE-3183:
-

nice, is there an alternative to if per-scan()?

like, my hack (not sure if its correct) was to never add -1 to terms cache... 
so this would affect less queries (e.g. rangequeries and MTQs) since they 
bypass the cache anyway?

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
 Attachments: LUCENE-3183.patch, LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit] at

[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type


[ 
https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046602#comment-13046602
 ] 

Michael McCandless commented on LUCENE-3186:


I think there are a few separate questions here...

Today, on doc values branch, if you mix up your doc values, ie a field
foo is at first indexed as a FLOAT_32 and then later you change your
mind and later docs are index field foo as BYTES_FIXED_STRAIGHT,
then this is bad news right now because everything will index fine,
you can close your IW, etc., but at some later time merges will hit
unrecoverable exceptions.  You'll have no choice but to fully rebuild
the index, which is rather awful.

However, this is true even for cases you would expect to work, eg say
foo was BYTES_FIXED_STRAIGHT but then later you decided you will
want to sort on this field and so you use BYTES_FIXED_SORTED.  (Simon:
this also results in exception I think...?).  Ideally we should do the
right thing here and upgrade the BYTES_FIXED_STRAIGHT to
BYTES_FIXED_SORTED (I think) -- Simon is there an issue open for this?

So, I think the first question here is: which cases should be merged
properly and which should be considered an error?  Probably we have
to work out the full matrix...

Then the second question is, for the error cases (if any!),
can/should we detect this up front, as you're indexing?

Then third question is, if we want to detect up front, do we do that
w/ fnx file or do we do that on init of IW (= no index change).


 DocValues type should be recored in FNX file to early fail if user specifies 
 incompatible type
 --

 Key: LUCENE-3186
 URL: https://issues.apache.org/jira/browse/LUCENE-3186
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 Currently segment merger fails if the docvalues type is not compatible across 
 segments. We already catch this problem if somebody changes the values type 
 for a field within one segment but not across segments. in order to do that 
 we should record the type in the fnx fiel alone with the field numbers.
 I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type


[ 
https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046604#comment-13046604
 ] 

Robert Muir commented on LUCENE-3186:
-

{quote}
So, I think the first question here is: which cases should be merged
properly and which should be considered an error? Probably we have
to work out the full matrix...
{quote}

this is all implementation details of docvalues, that it must deal with during 
merging.
I think it should work out the LCD and merge to that.

This is no different than if i have a field with all 8 character terms and then 
i add a 10-character term,
sure my impl/codec's encoding could internally rely upon the the fact all terms 
are 8 chars, but it must transparently change
its encoding to then support both 8 and 10 character terms and not throw an 
error.

If you mix up your doc values with ints and floats and bytes, isnt the least 
common denominator always bytes?
(just encode the int as 4 bytes or whatever).

So in other words, i think its up to docvalues to change its encoding to 
support the LCD, which might mean
downgrading ints to bytes or whatever, my only opinion is that it should never 
'create' data (this was my issue with fake norms,
lets not do that).


 DocValues type should be recored in FNX file to early fail if user specifies 
 incompatible type
 --

 Key: LUCENE-3186
 URL: https://issues.apache.org/jira/browse/LUCENE-3186
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 Currently segment merger fails if the docvalues type is not compatible across 
 segments. We already catch this problem if somebody changes the values type 
 for a field within one segment but not across segments. in order to do that 
 we should record the type in the fnx fiel alone with the field numbers.
 I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE


[ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046609#comment-13046609
 ] 

Michael McCandless commented on LUCENE-3183:


bq. nice, is there an alternative to if per-scan()?

I think you're idea should work; the bug is really in STE.scanTo, but, since we 
only call this method in 2 places, and these classes are package private in 
3.x, and I think it's unlikely apps will directly use STE from PreFlex codec on 
trunk, I think we can work around it in these places.  You're right this saves 
an if in many cases... I'll put comments explaining it.

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
 Attachments: LUCENE-3183.patch, LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at

[jira] [Updated] (LUCENE-3183) TestIndexWriter failure: AIOOBE


 [ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3183:


Attachment: LUCENE-3183.patch

here's my hack patch

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
 Attachments: LUCENE-3183.patch, LUCENE-3183.patch, 
 LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit] at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 [junit] at 
 org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
 [junit] at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
 [junit]

[jira] [Updated] (LUCENE-3183) TestIndexWriter failure: AIOOBE


 [ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3183:
---

Attachment: LUCENE-3183.patch

Patch using Robert's idea... I think it's ready to commit.

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
 Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, 
 LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit] at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 [junit] at 
 org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
 [junit] at

[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE


[ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046618#comment-13046618
 ] 

Robert Muir commented on LUCENE-3183:
-

+1, i think the comments are definitely necessary... this code is tricky :)

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
 Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, 
 LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit] at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 [junit] at 
 org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
 [junit] at

[jira] [Resolved] (LUCENE-3183) TestIndexWriter failure: AIOOBE


 [ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3183.


Resolution: Fixed
  Assignee: Michael McCandless

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
Assignee: Michael McCandless
 Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, 
 LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit] at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 [junit] at 
 org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
 [junit] at

[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE


[ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046622#comment-13046622
 ] 

Michael McCandless commented on LUCENE-3183:


Thanks selckin!  Keep feeding that awesome random-number-generator you've got 
over there!!

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
Assignee: Michael McCandless
 Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, 
 LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit] at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 [junit] at

[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE


[ 
https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046627#comment-13046627
 ] 

Robert Muir commented on LUCENE-3183:
-

I agree, i guestimated (running -Dtests.iter=1 and seeing 5 fails) the 
chance of finding this seed is like 1-in-2000!

 TestIndexWriter failure: AIOOBE
 ---

 Key: LUCENE-3183
 URL: https://issues.apache.org/jira/browse/LUCENE-3183
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: selckin
Assignee: Michael McCandless
 Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, 
 LUCENE-3183_test.patch


 trunk: r1133486 
 {code}
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Testcase: 
 testEmptyFieldName(org.apache.lucene.index.TestIndexWriter):  Caused an 
 ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280)
 [junit] 
 [junit] 
 [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_0 docCount=1
 [junit] codec=SegmentCodecs [codecs=[PreFlex], 
 provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807]
 [junit] compound=false
 [junit] hasProx=true
 [junit] numFiles=8
 [junit] size (MB)=0
 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, 
 lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, 
 java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [1 fields]
 [junit] test: field norms.OK [1 fields]
 [junit] test: terms, freq, prox...ERROR: 
 java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] java.lang.ArrayIndexOutOfBoundsException: -1
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719)
 [junit] at 
 org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249)
 [junit] at 
 org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147)
 [junit] at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610)
 [junit] at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
 [junit] at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
 [junit] at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857)
 [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit] at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit] at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit] at java.lang.reflect.Method.invoke(Method.java:597)
 [junit] at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit] at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit] at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit] at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 [junit]

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046635#comment-13046635
 ] 

Michael McCandless commented on LUCENE-2793:


bq. It seems that we don't need to provide IOContext to FieldInfos and 
SegmentInfo since we are reading them into memory anyway. I think you can just 
use a default context here without changing the constructors. Same is true for 
SegmentInfo

I think we should pass down readOnce=true for these cases?  EG some
kind of caching dir (or something) would know not to bother caching
such files...

Same for del docs, terms index, doc values (well, sometimes), etc.

bq. it seems that we should communicate the IOContext to the codec somehow. I 
suggest we put IOContext to SegmentWriteState and SegmentReadState that way we 
don't need to change the Codec interface and clutter it with internals. This 
would also fix mikes comment for FieldsConsumer etc.

+1 that's great.

bq. I really don't like OneMerge  I think we should add an abstract class 
(maybe MergeInfo) that exposes the estimatedMergeBytes, totalDocCount for now.

If we can't include OneMerge, and I agree it'd be nice not to, I think
we should try hard to pull stuff out of OneMerge that may be of
interest to a Dir impl?  Maybe:

  * estimatedTotalSegmentSizeBytes

  * docCount

  * optimize/expungeDeletes

  * isExternal (so Dir can know if this is addIndexes vs normal merging)

bq. Regarding the IOContext class I think we should design for what we have 
right now and since SegementInfo is not used anywhere (as far as I can see) we 
should add it once we need it. OneMerge should not go in there but rather the 
interface / abstract class I talked about above.

I agree, let's wait until we have a need.

In fact... SegmentInfo for flush won't work: we go and open all files
for flushing, write to them, close them, and only then do we make the
SegmentInfo.

So it seems like we should also have some abtracted stuff about the
to-be-flushed segment?  Maybe for starters the
estimatedSegmentSizeBytes?  EG, NRTCachingDir could use this to decide
whether to cache the new segment (today it fragile-ly relies on the
app to open new NRT reader frequently enough).


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()


[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046641#comment-13046641
 ] 

Michael McCandless commented on LUCENE-3179:


I think we should just commit this?  It's a useful API.

LUCENE-3171 (alternative nested docs impl w/ single pass collector) also could 
use this.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: 3.3

 Attachments: LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2955) Add utitily class to manage NRT reopening

2011-06-09 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046644#comment-13046644
]

Jason Rutherglen commented on LUCENE-2955:
--

Perhaps we can merge this functionality with SOLR-2565 and/or SOLR-2566, such
that Solr utilizes it for reader opening. However why would this issue use a
background thread and Solr performs a max time reopen?

Add utitily class to manage NRT reopening
-

Attachments: LUCENE-2955.patch, LUCENE-2955.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] index version compatibility (1.9 to 2.9.2)?

I have a Lucene index created with Lucene.Net 1.9.  I have a multi-segment 
index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that index, I 
get IndexOutOfRange exceptions in my collectors.  It is giving me document IDs 
that are larger than maxDoc.  

My index contains 377831 documents, and IndexReader.MaxDoc() is returning 
377831, but I get documents from Collect() with large values (for instance 
379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?  If not, 
is there some way I can convert it (in production we have many indexes 
containing about 200 million docs so I'd rather convert existing indexes than 
rebuilt them).

Thanks
Bob

RE: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?

2011-06-09 Thread Digy

Lucene.Net 2.9.2 should be able to read the index created with 1.9 without
any problem.
Can you try to search with luke
(http://www.getopt.org/luke/luke-0.9.9/lukeall-0.9.9.jar ) and iterate over
the results?

DIGY




-Original Message-
From: Robert Stewart [mailto:robert_stew...@epam.com] 
Sent: Thursday, June 09, 2011 7:06 PM
To: lucene-net-...@lucene.apache.org
Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?

I have a Lucene index created with Lucene.Net 1.9.  I have a multi-segment
index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that index, I
get IndexOutOfRange exceptions in my collectors.  It is giving me document
IDs that are larger than maxDoc.  

My index contains 377831 documents, and IndexReader.MaxDoc() is returning
377831, but I get documents from Collect() with large values (for instance
379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?  If
not, is there some way I can convert it (in production we have many indexes
containing about 200 million docs so I'd rather convert existing indexes
than rebuilt them).

Thanks
Bob=

[jira] [Resolved] (LUCENE-3152) MockDirectoryWrapper should wrap the lockfactory


 [ 
https://issues.apache.org/jira/browse/LUCENE-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3152.
-

Resolution: Fixed

 MockDirectoryWrapper should wrap the lockfactory
 

 Key: LUCENE-3152
 URL: https://issues.apache.org/jira/browse/LUCENE-3152
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/test
Reporter: Robert Muir
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3152.patch


 After applying the patch from LUCENE-3147, I added a line to make the test 
 fail if it cannot remove its temporary directory.
 I ran 'ant test' on linux 50 times, and it passed all 50 times.
 But on windows, it failed often because of write.lock... this is because of 
 unclosed writers in the test.
 MockDirectoryWrapper is currently unaware of this write.lock, I think it 
 should wrap the lockfactory so that .close() will fail if there are any 
 outstanding locks.
 Then hopefully these tests would fail on linux too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3152) MockDirectoryWrapper should wrap the lockfactory


[ 
https://issues.apache.org/jira/browse/LUCENE-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046652#comment-13046652
 ] 

Robert Muir commented on LUCENE-3152:
-

oops i meant to close this

 MockDirectoryWrapper should wrap the lockfactory
 

 Key: LUCENE-3152
 URL: https://issues.apache.org/jira/browse/LUCENE-3152
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/test
Reporter: Robert Muir
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3152.patch


 After applying the patch from LUCENE-3147, I added a line to make the test 
 fail if it cannot remove its temporary directory.
 I ran 'ant test' on linux 50 times, and it passed all 50 times.
 But on windows, it failed often because of write.lock... this is because of 
 unclosed writers in the test.
 MockDirectoryWrapper is currently unaware of this write.lock, I think it 
 should wrap the lockfactory so that .close() will fail if there are any 
 outstanding locks.
 Then hopefully these tests would fail on linux too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3106) commongrams filter calls incrementToken() after it returns false


 [ 
https://issues.apache.org/jira/browse/LUCENE-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3106.
-

   Resolution: Fixed
Fix Version/s: (was: 3.3)
   3.2
   4.0

this was fixed in LUCENE-3113

 commongrams filter calls incrementToken() after it returns false
 

 Key: LUCENE-3106
 URL: https://issues.apache.org/jira/browse/LUCENE-3106
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
 Fix For: 4.0, 3.2

 Attachments: LUCENE-3106.patch, LUCENE-3106_test.patch


 In LUCENE-3064, we beefed up MockTokenizer with assertions, and I started 
 cutting over some analysis tests to use MockTokenizer for better coverage.
 The commongrams tests fail, because they call incrementToken() after it 
 already returns false. 
 In general its my understanding consumers should not do this (and i know of a 
 few tokenizers that will actually throw exceptions if you do this, just like 
 java iterators and such).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] index version compatibility (1.9 to 2.9.2)?

I have a Lucene index created with Lucene.Nethttp://Lucene.Net/ 1.9.  I have 
a multi-segment index (non-optimized).   When I run 
Lucene.Nethttp://Lucene.Net/ 2.9.2 on top of that index, I get 
IndexOutOfRange exceptions in my collectors.  It is giving me document IDs that 
are larger than maxDoc.

My index contains 377831 documents, and IndexReader.MaxDoc() is returning 
377831, but I get documents from Collect() with large values (for instance 
379018).  Is an index built with Lucene.Nethttp://Lucene.Net/ 1.9 compatible 
with 2.9.2?  If not (and I assume it is not), is there some way I can convert 
existing indexes? (in production we have many indexes containing about 200 
million docs so I'd much rather convert existing indexes than rebuilt them).

Thanks
Bob

RE: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?

2011-06-09 Thread Digy

One more point, some write operations using Lucene.Net 2.9.2 (add, delete,
optimize etc.) upgrades automatically your index to 2.9.2.
But if your index is somehow corrupted(eg, due to some bug in 1.9) this may
result in data loss.

DIGY

-Original Message-
From: Robert Stewart [mailto:robert_stew...@epam.com] 
Sent: Thursday, June 09, 2011 7:06 PM
To: lucene-net-...@lucene.apache.org
Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?

I have a Lucene index created with Lucene.Net 1.9.  I have a multi-segment
index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that index, I
get IndexOutOfRange exceptions in my collectors.  It is giving me document
IDs that are larger than maxDoc.  

My index contains 377831 documents, and IndexReader.MaxDoc() is returning
377831, but I get documents from Collect() with large values (for instance
379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?  If
not, is there some way I can convert it (in production we have many indexes
containing about 200 million docs so I'd rather convert existing indexes
than rebuilt them).

Thanks
Bob=

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

[
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046674#comment-13046674
]

Martin Grotzke commented on SOLR-2583:
--

Yes, you're right regarding non-sparse fields. The question for the user will
be when to use true or false for sparse. It might also be the case, that files
differ, in that some are big, others are small. So I'm thinking about making it
adaptive: when the number of lines reach a certain percentage compared to the
number of docs, the float array is used, otherwise the doc-score map is used.
Perhaps it would be good to allow the user to override this, s.th. like
sparse=yes/no/auto.

What do you think?

Make external scoring more efficient (ExternalFileField, FileFloatSource)
-

Key: SOLR-2583
URL: https://issues.apache.org/jira/browse/SOLR-2583
Project: Solr
Issue Type: Improvement
Components: search
Reporter: Martin Grotzke
Priority: Minor
Attachments: FileFloatSource.java.patch

External scoring eats much memory, depending on the number of documents in
the index. The ExternalFileField (used for external scoring) uses
FileFloatSource, where one FileFloatSource is created per external scoring
file. FileFloatSource creates a float array with the size of the number of
docs (this is also done if the file to load is not found). If there are much
less entries in the scoring file than there are number of docs in total the
big float array wastes much memory.
This could be optimized by using a map of doc - score, so that the map
contains as many entries as there are scoring entries in the external file,
but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)


[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046675#comment-13046675
 ] 

Robert Muir commented on SOLR-2583:
---

a smallfloat option could help too? (1/4 the ram)

 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: JCC usage failure

2011-06-09 Thread Andi Vajda



 Hi Petrus,

On Thu, 9 Jun 2011, Petrus Hyvönen wrote:


Thank you for the support,

Finally I manage to get it working by reserving some words, and minimizing
the number of wrapped methods by just including those that I specifically
need:

python -m jcc --jar orekit-5.0.jar --include commons-math-2.2.jar --package
java.io --package org.apache.commons.math.geometry  --shared  --python
orekit --reserved INFINITE --reserved NO_DATA --reserved ERROR --install
--build

Is there a way to influence the docstrings generated (__doc__ function?), or
is there any way of converting from a javadoc to docstrings of the wrapped
library? :)


If there is a way to get at Java docstrings from the Java reflection API, 
then that would be a very cool addition to JCC !


Andi..

Distributed search capability

2011-06-09 Thread Jason Rutherglen

Hi,

I am wondering what happened to the distributed search capability of Lucene?

Thanks!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-09 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046688#comment-13046688
]

Yonik Seeley commented on SOLR-2583:

bq. Perhaps it would be good to allow the user to override this, s.th. like
sparse=yes/no/auto.

Sounds good! I wonder what the memory cut-off should be for auto... 10% of
maxDoc() or so?

bq. a smallfloat option could help too? (1/4 the ram)

Yep!

Make external scoring more efficient (ExternalFileField, FileFloatSource)
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?

I tried converting index using IndexWriter as follows:

Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9, new 
Lucene.Net.Analysis.KeywordAnalyzer());

writer.SetMaxBufferedDocs(2);
writer.SetMaxMergeDocs(100);
writer.SetMergeFactor(2);

writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new 
Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) });
  
writer.Commit();


That seems to work (I get what looks like a valid index directory at least).

But still when I run some tests using IndexSearcher I get the same problem (I 
get documents in Collect() which are larger than IndexReader.MaxDoc()).  Any 
idea what the problem could be?  

BTW, this is a problem because I lookup some fields (date ranges, etc.) in some 
custom collectors which filter out documents, and it assumes I dont get any 
documents larger than maxDoc.

Thanks,
Bob


On Jun 9, 2011, at 12:37 PM, Digy wrote:

 One more point, some write operations using Lucene.Net 2.9.2 (add, delete,
 optimize etc.) upgrades automatically your index to 2.9.2.
 But if your index is somehow corrupted(eg, due to some bug in 1.9) this may
 result in data loss.
 
 DIGY
 
 -Original Message-
 From: Robert Stewart [mailto:robert_stew...@epam.com] 
 Sent: Thursday, June 09, 2011 7:06 PM
 To: lucene-net-...@lucene.apache.org
 Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
 
 I have a Lucene index created with Lucene.Net 1.9.  I have a multi-segment
 index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that index, I
 get IndexOutOfRange exceptions in my collectors.  It is giving me document
 IDs that are larger than maxDoc.  
 
 My index contains 377831 documents, and IndexReader.MaxDoc() is returning
 377831, but I get documents from Collect() with large values (for instance
 379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?  If
 not, is there some way I can convert it (in production we have many indexes
 containing about 200 million docs so I'd rather convert existing indexes
 than rebuilt them).
 
 Thanks
 Bob=

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)


[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046692#comment-13046692
 ] 

Martin Grotzke commented on SOLR-2583:
--

Great, sounds like a further optimization for both sparse and non-sparse files. 
Though, as we had 4GB taken by FileFloatSource objects a reduction to 1/4 would 
still be too much for us so for our case I prefer the map based approach - then 
with Smallfloat.

 Make external scoring more efficient (ExternalFileField, FileFloatSource)
 -

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor
 Attachments: FileFloatSource.java.patch


 External scoring eats much memory, depending on the number of documents in 
 the index. The ExternalFileField (used for external scoring) uses 
 FileFloatSource, where one FileFloatSource is created per external scoring 
 file. FileFloatSource creates a float array with the size of the number of 
 docs (this is also done if the file to load is not found). If there are much 
 less entries in the scoring file than there are number of docs in total the 
 big float array wastes much memory.
 This could be optimized by using a map of doc - score, so that the map 
 contains as many entries as there are scoring entries in the external file, 
 but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2529) DIH update trouble with sql field name pk

2011-06-09 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046695#comment-13046695
 ] 

Shawn Heisey commented on SOLR-2529:


I ran into a similar problem, but it had nothing to do with the name of the 
field.

java.lang.IllegalArgumentException: deltaQuery has no column to resolve to 
declared primary key pk='did'

In my dih-config.xml file I have this.  The idea is simply to return a 
guaranteed result very quickly, so that it can then execute the 
deltaImportQuery, which as it happens is identical to the main query for a 
full-import:

deltaQuery=SELECT MAX(did) FROM ${dataimporter.request.dataView}

The result just has a column called MAX(did), not did.  The following change 
made it work, because it has the right field name to match the primary key in 
your DIH config.

deltaQuery=SELECT MAX(did) AS did FROM ${dataimporter.request.dataView}

Hopefully your problem is similar and can be easily solved in this way, but if 
not, this issue will still be here.

 DIH update trouble with sql field name pk
 ---

 Key: SOLR-2529
 URL: https://issues.apache.org/jira/browse/SOLR-2529
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 3.1, 3.2
 Environment: Debian Lenny, JRE 6
Reporter: Thomas Gambier
Priority: Blocker

 We are unable to use the DIH when database columnName primary key is named 
 pk.
 The reported solr error is :
 deltaQuery has no column to resolve to declared primary key pk='pk'
 We have made some investigations and found that the DIH have a mistake when 
 it's looking for the primary key between row's columns list.
 private String findMatchingPkColumn(String pk, Map row) {
 if (row.containsKey(pk))
   throw new IllegalArgumentException(
 String.format(deltaQuery returned a row with null for primary key %s, 
 pk));
 String resolvedPk = null;
 for (String columnName : row.keySet()) {
   if (columnName.endsWith(. + pk) || pk.endsWith(. + columnName)) {
 if (resolvedPk != null)
   throw new IllegalArgumentException(
 String.format(
   deltaQuery has more than one column (%s and %s) that might resolve 
 to declared primary key pk='%s',
   resolvedPk, columnName, pk));
   resolvedPk = columnName;
 }
   }
   if (resolvedPk == null)
 throw new IllegalArgumentException(
   String.format(deltaQuery has no column to resolve to declared primary 
 key pk='%s', pk));
   LOG.info(String.format(Resolving deltaQuery column '%s' to match entity's 
 declared pk '%s', resolvedPk, pk));
   return resolvedPk;
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3177) Decouple indexer from Document/Field impls


 [ 
https://issues.apache.org/jira/browse/LUCENE-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3177:
---

Attachment: LUCENE-3177.patch

New patch, removing IndexableDocument so now we only have
IndexableField and IW accepts Iterable? extends IndexableField to
add/updateDocument.

This breaks one Lucene core test (TestDocBoost), because indexer no
longer applies doc boost.

I'd like to cut a new branch, and commit this starting patch there.
I think (hopefully) the plan for the branch will be something like
this:

  * Commit/iterate on this issue, which fully decouples indexer
(oal.index.*) from our current
Field/Fieldable/AbstractField/Document impl.  This gives
LUCENE-2308 more freedom to make concrete user space classes.

  * Commit/iterate on LUCENE-2308, which collapses the *Field
hierarchy to one concrete class, and adds FieldType
hierarchy.

  * Maybe: do LUCENE-2309 (decouple analyzers from indexer).  This
would mean IndexableField no longer needs isTokenized, nor the
string/readerValue() methods.  Indexer would just ask for the
tokenStream, and the doc/field impl would go and look at its flags
like NOT_ANALYZED, etc., to figure out what token stream to
create.


 Decouple indexer from Document/Field impls
 --

 Key: LUCENE-3177
 URL: https://issues.apache.org/jira/browse/LUCENE-3177
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3177.patch, LUCENE-3177.patch


 I think we should define minimal iterator interfaces,
 IndexableDocument/Field, that indexer requires to index documents.
 Indexer would consume only these bare minimum interfaces, not the
 concrete Document/Field/FieldType classes from oal.document package.
 Then, the Document/Field/FieldType hierarchy is one concrete impl of
 these interfaces. Apps are free to make their own impls as well.
 Maybe eventually we make another impl that enforces a global schema,
 eg factored out of Solr's impl.
 I think this frees design pressure on our Document/Field/FieldType
 hierarchy, ie, these classes are free to become concrete
 fully-featured user-space classes with all sorts of friendly sugar
 APIs for adding/removing fields, getting/setting values, types, etc.,
 but they don't need substantial extensibility/hierarchy. Ie, the
 extensibility point shifts to IndexableDocument/Field interface.
 I think this means we can collapse the three classes we now have for a
 Field (Fieldable/AbstracField/Field) down to a single concrete class
 (well, except for LUCENE-2308 where we want to break out dedicated
 classes for different field types...).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?

2011-06-09 Thread Digy

Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the
problem.
DIGY

-Original Message-
From: Robert Stewart [mailto:robert_stew...@epam.com] 
Sent: Thursday, June 09, 2011 8:40 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?

I tried converting index using IndexWriter as follows:

Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9,
new Lucene.Net.Analysis.KeywordAnalyzer());

writer.SetMaxBufferedDocs(2);
writer.SetMaxMergeDocs(100);
writer.SetMergeFactor(2);

writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new
Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) });

writer.Commit();

That seems to work (I get what looks like a valid index directory at least).

But still when I run some tests using IndexSearcher I get the same problem
(I get documents in Collect() which are larger than IndexReader.MaxDoc()).
Any idea what the problem could be?  

BTW, this is a problem because I lookup some fields (date ranges, etc.) in
some custom collectors which filter out documents, and it assumes I dont get
any documents larger than maxDoc.

Thanks,
Bob

On Jun 9, 2011, at 12:37 PM, Digy wrote:

 One more point, some write operations using Lucene.Net 2.9.2 (add, delete,
 optimize etc.) upgrades automatically your index to 2.9.2.
 But if your index is somehow corrupted(eg, due to some bug in 1.9) this
may
 result in data loss.

 DIGY

 -Original Message-
 From: Robert Stewart [mailto:robert_stew...@epam.com] 
 Sent: Thursday, June 09, 2011 7:06 PM
 To: lucene-net-...@lucene.apache.org
 Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?

 I have a Lucene index created with Lucene.Net 1.9.  I have a multi-segment
 index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that index,
I
 get IndexOutOfRange exceptions in my collectors.  It is giving me document
 IDs that are larger than maxDoc.  

 My index contains 377831 documents, and IndexReader.MaxDoc() is returning
 377831, but I get documents from Collect() with large values (for instance
 379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?  If
 not, is there some way I can convert it (in production we have many
indexes
 containing about 200 million docs so I'd rather convert existing indexes
 than rebuilt them).

 Thanks
 Bob=

[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

2011-06-09 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046707#comment-13046707
 ] 

Yonik Seeley commented on SOLR-2564:


bq.  Actually, the worst case is twice as slow due to unneeded caching of a 
simple query.
bq. Sorry, what do you mean here?

The worst case with this patch as a whole (due to the caching by default).
This type of query is twice as slow:
{code}
http://localhost:8983/solr/select?q=*:*group=truegroup.field=single1000_i
{code}

Which led to me wondering about how complex queries must be before the caching 
is a win.



 Integrating grouping module into Solr 4.0
 -

 Key: SOLR-2564
 URL: https://issues.apache.org/jira/browse/SOLR-2564
 Project: Solr
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
 Fix For: 4.0

 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
 SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch


 Since work on grouping module is going well. I think it is time to wire this 
 up in Solr.
 Besides the current grouping features Solr provides, Solr will then also 
 support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

[
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046712#comment-13046712
]

Martin Grotzke commented on SOLR-2583:
--

Sounds good! I wonder what the memory cut-off should be for auto... 10% of
maxDoc() or so?

I'd compare both strategies to see what's the break-even, this should give an
absolute number.

Make external scoring more efficient (ExternalFileField, FileFloatSource)
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2308) Separately specify a field's type

[
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046715#comment-13046715
]

Michael McCandless commented on LUCENE-2308:

I created a new branch, where we can iterate on these interlinked issues:

{noformat}
https://svn.apache.org/repos/asf/lucene/dev/branches/fieldtype
{noformat}

And I committed the initial patch from LUCENE-3177, decoupling indexer from the
doc/field impl.

Separately specify a field's type
-

Key: LUCENE-2308
URL: https://issues.apache.org/jira/browse/LUCENE-2308
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
Labels: gsoc2011, lucene-gsoc-11, mentor
Fix For: 4.0

Attachments: LUCENE-2308.patch, LUCENE-2308.patch

This came up from dicussions on IRC. I'm summarizing here...
Today when you make a Field to add to a document you can set things
index or not, stored or not, analyzed or not, details like omitTfAP,
omitNorms, index term vectors (separately controlling
offsets/positions), etc.
I think we should factor these out into a new class (FieldType?).
Then you could re-use this FieldType instance across multiple fields.
The Field instance would still hold the actual value.
We could then do per-field analyzers by adding a setAnalyzer on the
FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
for per-field codecs (with flex), where we now have
PerFieldCodecWrapper).
This would NOT be a schema! It's just refactoring what we already
specify today. EG it's not serialized into the index.
This has been discussed before, and I know Michael Busch opened a more
ambitious (I think?) issue. I think this is a good first baby step. We could
consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: managing CHANGES.txt?

2011-06-09 Thread Chris Hostetter


: you just commit it to the version it was added.
: 
: For example, if you are adding something to 3x and trunk, commit it to
: the 3x section of trunk's CHANGES.txt
: then when you svn merge, there will be no merge conflict, it will just work.

That assumes you know, before commiting to trunk, that it will (or wont) 
be backported to 3x.

The approach (and the cleanness of the merges) completley breaks down if 
you start out assuming a feature is targetting 4x, and then later decide 
to backport it.

it will also break down in even more fun and confusing ways if/when we 
have our first 4.0 release and then someone pushes for having a 3.42 
feature release after that (to push out some high value features to people 
not yet ready to upgrade to 4.0) because the changes legitimately need to 
show up in both the 3.42 and 4.1 release notes.

I've tried to raise these concerns several times in the past and gotten 
virtually no response...

http://markmail.org/message/s6zq4e7aomanxulp
http://search.lucidimagination.com/search/document/9a9b1327fe281305/solr_changes_3_1_4_0

I really think that the 4.0 section of CHANGES should list *every* 
change on the trunk prior to the 4.0 release, even if it was backported to 
3.1 or 3.3 -- because fundementally the changes are not neccessarily 
identicle.  A bug fix that has been backported may be subtley different 
because of the differneces between the branches.

I also (still) agree with Ryan about the historic record nature of 
CHANGES.txt not making sense anymore now that we have multiple feature 
release branches going at once...

 Can we delete everything past line 441 in:
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/CHANGES.txt
 and add a comment saying to look at:
 https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/CHANGES.txt

+1

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0


[ 
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046766#comment-13046766
 ] 

Michael McCandless commented on SOLR-2564:
--

Ahh, I see.  Could we turn off caching if the query is instanceof AllDocsQuery?

 Integrating grouping module into Solr 4.0
 -

 Key: SOLR-2564
 URL: https://issues.apache.org/jira/browse/SOLR-2564
 Project: Solr
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
 Fix For: 4.0

 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
 SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch


 Since work on grouping module is going well. I think it is time to wire this 
 up in Solr.
 Besides the current grouping features Solr provides, Solr will then also 
 support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

solr example != solr defaults

2011-06-09 Thread Chris Hostetter



Trying to catch up on my email/jira, i notice this comment from rmuir in 
SOLR-2519...


I think we need to stop kidding ourselves about example/default and 
just recognize that 99.999% of users just use the example as 
their default configuration. Guys, the example is the default, there is 
simply not argument, this is the reality!


While i agree that we should recognize and expect solr users to start with 
the example configs and use them as their default configs under no 
circumstances should we get in the habit of refering to things specified 
in those configs the default behavior or the default settings


this isn't a question of kidding ourselves, it's a question of genuinely 
confusing users about the differnece between behavior that exists because 
of what is in the example configs that they may have copied and behavior 
that exists because of hardcoded defaults in java code.


Example #1: for backwards compatability, the default lockType used in 
solr when no lockType/ declaration is found is simple but the 
*example* lockType/ declared in the *example* configs is native.


Example #2: Many request handler instances are declared/configured in the 
example solrconfig.xml file, but only 1 request handler instance will 
exist by *default* if the user removes those requestHandler/ 
declarations from the solrconfig.xml



The point is: If you find yourself getting into the habit of refering to 
config values/settings in the example configs as the defaults then you 
*will* misslead users into thinking that you are describing the default 
behavior when those values/settings are absent from the configs.





-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

[
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046785#comment-13046785
]

Robert Muir commented on SOLR-2583:
---

bq. Though, as we had 4GB taken by FileFloatSource objects a reduction to 1/4
would still be too much for us so for our case I prefer the map based approach
- then with Smallfloat.

If the problem is sparsity, maybe use a two-stage table, still faster than a
hashmap and much better for the worst case.

Make external scoring more efficient (ExternalFileField, FileFloatSource)
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: managing CHANGES.txt?

On Thu, Jun 9, 2011 at 3:22 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : you just commit it to the version it was added.
 :
 : For example, if you are adding something to 3x and trunk, commit it to
 : the 3x section of trunk's CHANGES.txt
 : then when you svn merge, there will be no merge conflict, it will just work.

 That assumes you know, before commiting to trunk, that it will (or wont)
 be backported to 3x.

 The approach (and the cleanness of the merges) completley breaks down if
 you start out assuming a feature is targetting 4x, and then later decide
 to backport it.

you just first move your change to the 3.x section?


 it will also break down in even more fun and confusing ways if/when we
 have our first 4.0 release and then someone pushes for having a 3.42
 feature release after that (to push out some high value features to people
 not yet ready to upgrade to 4.0) because the changes legitimately need to
 show up in both the 3.42 and 4.1 release notes.

we already raised this issue and decided against it for a number of
reasons, it was raised on the dev list and everyone voted +1

http://www.lucidimagination.com/search/document/a42f9a22fe39c4b4/discussion_trunk_and_stable_release_strategy#67815ec25c055810

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: solr example != solr defaults

On Thu, Jun 9, 2011 at 3:47 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 The point is: If you find yourself getting into the habit of refering to
 config values/settings in the example configs as the defaults then you
 *will* misslead users into thinking that you are describing the default
 behavior when those values/settings are absent from the configs.


I'm not going to really get hung on the technicalities here.

We can call what happens when there is no configuration the
fallback settings, if thats less confusing, but to me the example is
the defaults.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?

I found the problem.  The problem is that I have a custom query optimizer, 
and that replaces certain TermQuery's within a Boolean query with a custom 
Query and this query has its own weight/scorer that retrieves matching 
documents from an in-memory cache (and that is not Lucene backed).  But it 
looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper 
which assumes Collect() needs called for multiple segments - so it is adding a 
start offset to the doc ID that comes from my custom query implementation.  I 
looked at the new Collector class and it seems it works the same way (assumes 
it needs to set the next index reader with some offset).  How can I make my 
custom query work with the new API (so that there is basically a single 
segment in RAM that my query uses, but still other query clauses in same 
boolean query use multiple lucene segments)?  I am sure that is not clear and 
will try to provide more detail soon.

Thanks
Bob


On Jun 9, 2011, at 1:48 PM, Digy wrote:

 Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the
 problem.
 DIGY
 
 -Original Message-
 From: Robert Stewart [mailto:robert_stew...@epam.com] 
 Sent: Thursday, June 09, 2011 8:40 PM
 To: lucene-net-...@lucene.apache.org
 Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
 
 I tried converting index using IndexWriter as follows:
 
 Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9,
 new Lucene.Net.Analysis.KeywordAnalyzer());
 
 writer.SetMaxBufferedDocs(2);
 writer.SetMaxMergeDocs(100);
 writer.SetMergeFactor(2);
 
 writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new
 Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) });
 
 writer.Commit();
 
 
 That seems to work (I get what looks like a valid index directory at least).
 
 But still when I run some tests using IndexSearcher I get the same problem
 (I get documents in Collect() which are larger than IndexReader.MaxDoc()).
 Any idea what the problem could be?  
 
 BTW, this is a problem because I lookup some fields (date ranges, etc.) in
 some custom collectors which filter out documents, and it assumes I dont get
 any documents larger than maxDoc.
 
 Thanks,
 Bob
 
 
 On Jun 9, 2011, at 12:37 PM, Digy wrote:
 
 One more point, some write operations using Lucene.Net 2.9.2 (add, delete,
 optimize etc.) upgrades automatically your index to 2.9.2.
 But if your index is somehow corrupted(eg, due to some bug in 1.9) this
 may
 result in data loss.
 
 DIGY
 
 -Original Message-
 From: Robert Stewart [mailto:robert_stew...@epam.com] 
 Sent: Thursday, June 09, 2011 7:06 PM
 To: lucene-net-...@lucene.apache.org
 Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
 
 I have a Lucene index created with Lucene.Net 1.9.  I have a multi-segment
 index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that index,
 I
 get IndexOutOfRange exceptions in my collectors.  It is giving me document
 IDs that are larger than maxDoc.  
 
 My index contains 377831 documents, and IndexReader.MaxDoc() is returning
 377831, but I get documents from Collect() with large values (for instance
 379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?  If
 not, is there some way I can convert it (in production we have many
 indexes
 containing about 200 million docs so I'd rather convert existing indexes
 than rebuilt them).
 
 Thanks
 Bob=

Re: managing CHANGES.txt?

On Thu, Jun 9, 2011 at 4:44 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

: The approach (and the cleanness of the merges) completley breaks down if
: you start out assuming a feature is targetting 4x, and then later decide
: to backport it.
:
: you just first move your change to the 3.x section?

so you're saying that to backport something from trunk to 3x you're
saying the process should be:
* first you should commit a change to trunk's CHANGES.txt moving the
previously commited entry to the appropraite 3.x.y section
* then you should merge the *two* commits to the 3x branch

I think so? I guess in general, most things unless they are
super-scary tend to get backported immediately to 3.x (and you know
up-front you are going to do this) so in practice this hasn't been a
problem?

: we already raised this issue and decided against it for a number of
: reasons, it was raised on the dev list and everyone voted +1
:
:
http://www.lucidimagination.com/search/document/a42f9a22fe39c4b4/discussion_trunk_and_stable_release_strategy#67815ec25c055810

i contest your characterization of everyone but clearly i missed that
thread back when it happened. That only address the issue of 3.x feature
releases after 4.0 comes out -- but it still doesn't address the porblem
of bug fixes backported from 4.x to 3.x after 4.0 -- those will still be a
serious problem if we keep removing things from the trunk CHANGES.txt when
backporting.

OK, well everyone that did vote, voted +1. If you disagree please
respond to that thread! I think it would make things confusing if we
released 4.0 say today, then released 3.3 later, and 4.0 couldnt read
3.3 indexes... but please reply to it.

As far as bugfix releases, in lucene we have always had this issue
(e.g. if we do 3.2.1 we have the issue now). Thats why we have in our
ReleaseTODO a task where we deal with this (and i noticed it had been
missing from one of the bugfix 3.0.x releases and fixed that for 3.2).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-09 Thread Simon Willnauer

hey jason, you are talking about the RMI contrib/remote? It was
dropped a while ago since everybody rolls its own mechanism and some
queries / filters didn't work with it.

simon

On Thu, Jun 9, 2011 at 7:29 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Hi,

 I am wondering what happened to the distributed search capability of Lucene?

 Thanks!

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

2011-06-09 Thread Martijn van Groningen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046845#comment-13046845
]

Martijn van Groningen commented on SOLR-2564:
-

{quote}
But I think caching should still default to on, just limited as a pctg
of the number of docs in the index. Ie, by default we will cache the
result set if it's less than 20% (say) of total docs in your index.
{quote}
Maybe instead of specifying a maximum size for the second pass cache, we could
specify it with a percentage (0 till 100) relative from maxdoc. In this case
when the index grows in number of documents the cache is still used for a lot
of queries (depending on the specified percentage). So if we go with this maybe
group.cacheMB should be renamed to group.cache.percentage. The default can then
be something like 20. Any thoughts about this?

Integrating grouping module into Solr 4.0
-

Key: SOLR-2564
URL: https://issues.apache.org/jira/browse/SOLR-2564
Project: Solr
Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Fix For: 4.0

Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch,
SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch

Since work on grouping module is going well. I think it is time to wire this
up in Solr.
Besides the current grouping features Solr provides, Solr will then also
support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: managing CHANGES.txt?

2011-06-09 Thread Ryan McKinley

The release strategy and CHANGES strategy seem different (but related)
to me. I agree with the release strategy outlined in that thread, but
don't see how it answers questions about maintaining CHANGES.txt

The thing that seems wierd is that the historic release info in
CHANGES.txt is potentially different then what will presumably be
released in the 3.x branch. For example right now, if you take the
3.x lucene/CHANGES and paste them in the right place on trunk, there
there are a bunch of diffs for names with accents
- have been deleted. (Christian Kohlsch├╝tter via Mike McCandless)
+ have been deleted. (Christian Kohlschⁿtter via Mike McCandless)

but also real differences like:
-* LUCENE-2130: Fix performance issue when FuzzyQuery runs on a
- multi-segment index (Michael McCandless)
-

The same exercise in solr/CHANGES.txt reveals lots of differences.

Is this expected? It seem more like a by-product of trying to keep
things in sync. I suppose that could be fixed with some good

To simplify the process, I suggest we remove historic info from /trunk
and add point people to the CHANGE in the current stable branch (3.x)
-- when /trunk is moved to /branch_4x we would move everything there.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-09 Thread Jason Rutherglen

Right, if that's not around, one needs to use multi searcher, that's gone
too?
On Jun 9, 2011 2:39 PM, Simon Willnauer simon.willna...@googlemail.com
wrote:
 hey jason, you are talking about the RMI contrib/remote? It was
 dropped a while ago since everybody rolls its own mechanism and some
 queries / filters didn't work with it.

 simon

 On Thu, Jun 9, 2011 at 7:29 PM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Hi,

 I am wondering what happened to the distributed search capability of
Lucene?

 Thanks!

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2955) Add utitily class to manage NRT reopening

[
https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-2955:
---

Attachment: LUCENE-2955.patch

OK, new patch, folding in Simon's Chris's feedback (thanks!).

I pulled out the reopen thread into a separate class, so that one can
now instantiate NRTManager but do their own reopening (no bg reopen
thread). So eg if you want to hijack indexing threads to do reopen,
you can.

But if you want to simply reopen on a periodic basis with the bg
thread, instantiate NRTManagerReopenThread, passing it the manager and
your max and min staleness. Max staleness applies when no caller is
waiting for a specific indexing change; min applies when one is.

I didn't implement a ReopenStrategy... I think that should live
above this class. But, I did add a WaitingListener so that such a
reopener reopener can be notified when someone is waiting for a
specific generation to be visible (NRTManagerReopenThread uses
that).

Add utitily class to manage NRT reopening
-

Attachments: LUCENE-2955.patch, LUCENE-2955.patch, LUCENE-2955.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-09 Thread Andrzej Bialecki


On 6/10/11 12:10 AM, Jason Rutherglen wrote:

Right, if that's not around, one needs to use multi searcher, that's
gone too?


Yes, and rightfully so - it didn't handle properly some query types, so 
you would actually get wrong results.


For now the answer is use Solr if you are less advanced, or roll your 
own (and contribute it back!) if you are more advanced ;)


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0


[ 
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046911#comment-13046911
 ] 

Michael McCandless commented on SOLR-2564:
--

+1

 Integrating grouping module into Solr 4.0
 -

 Key: SOLR-2564
 URL: https://issues.apache.org/jira/browse/SOLR-2564
 Project: Solr
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
 Fix For: 4.0

 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
 SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch


 Since work on grouping module is going well. I think it is time to wire this 
 up in Solr.
 Besides the current grouping features Solr provides, Solr will then also 
 support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-3.x - Build # 400 - Failure