[Lucene.Net] [jira] [Closed] (LUCENENET-421) Segment files ocasionaly disappearing making index corrupted
[ https://issues.apache.org/jira/browse/LUCENENET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy closed LUCENENET-421. -- Resolution: Invalid Seems like reporter isn't interested any more in this issue. DIGY Segment files ocasionaly disappearing making index corrupted Key: LUCENENET-421 URL: https://issues.apache.org/jira/browse/LUCENENET-421 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Media Chase ECF50 in the MastermindToys.com online toy store, IIS 7 under Win 2008 R2, index on RAID 1 Reporter: Fedor Taiakin IIS 7 under Win 2008 R2, index located on RAID 1 The only operations Add Document and Delete Document, optimize = false. Ocasionally the segment files disappear, corrupting index. No other exceptions prior to inability to open index: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. --- System.IO.FileNotFoundException: Could not find file 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. File name: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs' at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run() at Lucene.Net.Index.IndexReader.Open(Directory directory) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-404) Improve brand logo design
[ https://issues.apache.org/jira/browse/LUCENENET-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046757#comment-13046757 ] Troy Howard commented on LUCENENET-404: --- I will post the artifacts here, and commit them to the repo. We need to get a SGA submitted from StackOverflow as well. Improve brand logo design - Key: LUCENENET-404 URL: https://issues.apache.org/jira/browse/LUCENENET-404 Project: Lucene.Net Issue Type: Sub-task Components: Project Infrastructure Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Labels: branding, logo The existing Lucene.Net logo leaves a lot to be desired. We'd like a new logo that is modern and well designed. To implement this, Troy is coordinating with StackOverflow/StackExchange to manage a logo design contest, the results of which will be our new logo design. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Created] (LUCENENET-424) IsolatedStorage Support for Windows Phone 7
IsolatedStorage Support for Windows Phone 7 --- Key: LUCENENET-424 URL: https://issues.apache.org/jira/browse/LUCENENET-424 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Reporter: Prescott Nasser Assignee: Prescott Nasser Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Create IsolatedStorage Store to support windows phone 7 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (SOLR-1707) Use google collections immutable collections instead of Collections.unmodifiable**
[ https://issues.apache.org/jira/browse/SOLR-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-1707. -- Resolution: Not A Problem This is a trivial issue Use google collections immutable collections instead of Collections.unmodifiable** -- Key: SOLR-1707 URL: https://issues.apache.org/jira/browse/SOLR-1707 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 3.3 Attachments: SOLR-1707.patch, TestPerf.java google collections offer true immutability and more memory efficiency -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2582) UIMAUpdateRequestProcessor error handling with small texts
UIMAUpdateRequestProcessor error handling with small texts -- Key: SOLR-2582 URL: https://issues.apache.org/jira/browse/SOLR-2582 Project: Solr Issue Type: Bug Affects Versions: 3.2 Reporter: Tommaso Teofili Fix For: 3.3 In UIMAUpdateRequestProcessor the catch block in processAdd() method can have a StringIndexOutOfBoundsException while composing the error message if the logging field is not set and the text being processed is shorter than 100 chars (...append(text.substring(0, 100))...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2582) UIMAUpdateRequestProcessor error handling with small texts
[ https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046370#comment-13046370 ] Tommaso Teofili commented on SOLR-2582: --- A possible fix which still allows an easy debug could be to get the logging field property on processor initialization, then if that was not configured it's possible to get the uniquekey from the SolrCore passed in the initialize() method : String logFieldName = solrUIMAConfiguration.getLogField()!= null ? solrUIMAConfiguration.getLogField() : solrCore.getSchema().getUniqueKeyField().getName(); UIMAUpdateRequestProcessor error handling with small texts -- Key: SOLR-2582 URL: https://issues.apache.org/jira/browse/SOLR-2582 Project: Solr Issue Type: Bug Affects Versions: 3.2 Reporter: Tommaso Teofili Fix For: 3.3 In UIMAUpdateRequestProcessor the catch block in processAdd() method can have a StringIndexOutOfBoundsException while composing the error message if the logging field is not set and the text being processed is shorter than 100 chars (...append(text.substring(0, 100))...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8710 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8710/ 3 tests failed. REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-131: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test7041670436tmp/_c.nrm (Too many open files in system) Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-131: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test7041670436tmp/_c.nrm (Too many open files in system) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test7041670436tmp/_c.nrm (Too many open files in system) at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822) REGRESSION: org.apache.lucene.index.TestStressIndexing.testStressIndexAndSearching Error Message: null Stack Trace: junit.framework.AssertionFailedError: at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) at org.apache.lucene.index.TestStressIndexing.runStressTest(TestStressIndexing.java:152) at org.apache.lucene.index.TestStressIndexing.testStressIndexAndSearching(TestStressIndexing.java:165) REGRESSION: org.apache.lucene.store.TestMultiMMap.testRandomChunkSizes Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/TestMultiMMap3983339852tmp/mmap363983339854tmp/_0_0.tib (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/TestMultiMMap3983339852tmp/mmap363983339854tmp/_0_0.tib (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:416) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:293) at org.apache.lucene.index.codecs.BlockTermsWriter.init(BlockTermsWriter.java:75) at org.apache.lucene.index.codecs.mocksep.MockSepCodec.fieldsConsumer(MockSepCodec.java:73) at org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.init(PerFieldCodecWrapper.java:67) at org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:55) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:58) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:80) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:75) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:457) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:421) at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:313) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:385) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1233) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1214) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:136) at org.apache.lucene.store.TestMultiMMap.assertChunking(TestMultiMMap.java:74) at org.apache.lucene.store.TestMultiMMap.testRandomChunkSizes(TestMultiMMap.java:51) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) Build Log (for compile errors): [...truncated 3909 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046403#comment-13046403 ] Simon Willnauer commented on LUCENE-2793: - Hey varun, here are some more comments for the latest complete patch: * We should have a static instance for IOContext with Context.Other which you can use in BitVector / CheckIndex for instance Maybe IOContext#DEFAULT_CONTEXT * It seems that we don't need to provide IOContext to FieldInfos and SegmentInfo since we are reading them into memory anyway. I think you can just use a default context here without changing the constructors. Same is true for SegmentInfos * This is unrelated to your patch but in PreFlexFields we should use IndexFileNames.segmentFileName(info.name, , PreFlexCodec.FREQ_EXTENSION) and IndexFileNames.segmentFileName(info.name, , PreFlexCodec.PROX_EXTENSION) instead of info.name + .frq and info.name + .prx * it seems that we should communicate the IOContext to the codec somehow. I suggest we put IOContext to SegmentWriteState and SegmentReadState that way we don't need to change the Codec interface and clutter it with internals. This would also fix mikes comment for FieldsConsumer etc. * TermVectorsWriter is only used in Merges so maybe it should also get a Context.Merge for consistency? * I really don't like OneMerge :) I think we should add an abstract class (maybe MergeInfo) that exposes the estimatedMergeBytes, totalDocCount for now. * small typo in RamDirectory, there is a space missing after the second file here: dir.copy(this, file, file,context); * SegmentReader should also use the static Default IOContext - make sure its used where needed :) Regarding the IOContext class I think we should design for what we have right now and since SegementInfo is not used anywhere (as far as I can see) we should add it once we need it. OneMerge should not go in there but rather the interface / abstract class I talked about above. Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3180) Can't delete a document using deleteDocument(int docID) if using IndexWriter AND IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046415#comment-13046415 ] Danny Lade commented on LUCENE-3180: Hello Simon, I will add an ID to my documents, that should do the trick. Thanks for the help. :-) Can't delete a document using deleteDocument(int docID) if using IndexWriter AND IndexReader Key: LUCENE-3180 URL: https://issues.apache.org/jira/browse/LUCENE-3180 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.2 Environment: Windows Reporter: Danny Lade Attachments: ImpossibleLuceneCode.java It is impossible to delete a document with reader.deleteDocument(docID) if using an IndexWriter too. using: {code:java} writer = new IndexWriter(directory, config); reader = IndexReader.open(writer, true); {code} results in: {code:java} Exception in thread main java.lang.UnsupportedOperationException: This IndexReader cannot make any changes to the index (it was opened with readOnly = true) at org.apache.lucene.index.ReadOnlySegmentReader.noWrite(ReadOnlySegmentReader.java:23) at org.apache.lucene.index.ReadOnlyDirectoryReader.acquireWriteLock(ReadOnlyDirectoryReader.java:43) at org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1067) at de.morpheum.morphy.ImpossibleLuceneCode.main(ImpossibleLuceneCode.java:60) {code} and using: {code:java} writer = new IndexWriter(directory, config); reader = IndexReader.open(directory, false); {code} results in: {code:java} org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@S:\Java\Morpheum\lucene\write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.DirectoryReader.acquireWriteLock(DirectoryReader.java:765) at org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:1067) at de.morpheum.morphy.ImpossibleLuceneCode.main(ImpossibleLuceneCode.java:69) {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type
DocValues type should be recored in FNX file to early fail if user specifies incompatible type -- Key: LUCENE-3186 URL: https://issues.apache.org/jira/browse/LUCENE-3186 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers. I marked this 4.0 since it should not block the landing on trunk -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc - score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: JCC usage failure
Thank you for the support, Finally I manage to get it working by reserving some words, and minimizing the number of wrapped methods by just including those that I specifically need: python -m jcc --jar orekit-5.0.jar --include commons-math-2.2.jar --package java.io --package org.apache.commons.math.geometry --shared --python orekit --reserved INFINITE --reserved NO_DATA --reserved ERROR --install --build Is there a way to influence the docstrings generated (__doc__ function?), or is there any way of converting from a javadoc to docstrings of the wrapped library? :) Thanks Regards /Petrus On Fri, Jun 3, 2011 at 5:39 PM, Andi Vajda va...@apache.org wrote: On Jun 3, 2011, at 1:21, Petrus Hyvönen petrus.hyvo...@gmail.com wrote: Hi, I am trying to use JCC to wrap a java library (orekit.org), and have successfully done so on the mac platform. As I also use windows I try to do the same there. JCC compiles fine on both platforms (using --compiler=mingw32 on win, using the python xy distribution with mingw v4.5.2). the wrapper is successfully created on mac by on windows I needed to add the .__main__ for jcc: python -m jcc.__main__ --jar orekit-5.0.jar --jar commons-math-2.2.jar --include orekit-data.zip --shared --python orekit --install --files separate --build the build goes on for some time and fails with extract below. Does anyone has some experience with this failure, and where does one start to solve it, is it the compiler, jcc? I have also tried with a fresh install with mingw32 but no difference. Any help or directions appricated. /Petrus This is very likely to be caused by some variable name coming from your java sources that is defined as a macro by the header files coming from your compiler. To work this around add the variable name to the reserved word list by adding it to the jcc command line via the --reserved flag. To find which variable it is look at the error messages below and at the code they refer to. For example, Dfp.h, line 109 or Dfp.cpp, line 22. Andi.. In file included from build\_orekit\org\apache\commons\math\dfp\Dfp.cpp:3:0: build\_orekit/org/apache/commons/math/dfp/Dfp.h:109:38: error: expected unqualified-id before numeric constant build\_orekit\org\apache\commons\math\dfp\Dfp.cpp:22:32: error: expected unqualified-id before numeric constant build\_orekit\org\apache\commons\math\dfp\Dfp.cpp: In static member function 'static _jclass* org::apache::commons::math::dfp::Dfp::initializeClass()': build\_orekit\org\apache\commons\math\dfp\Dfp.cpp:100:79: error: lvalue required as left operand of assignment build\_orekit\org\apache\commons\math\dfp\Dfp.cpp: In static member function 'static void org::apache::commons::math::dfp::t_Dfp::initialize(PyObject*)': build\_orekit\org\apache\commons\math\dfp\Dfp.cpp:476:101: error: expected unqualified-id before numeric constant error: command 'gcc' failed with exit status 1 -- _ Petrus Hyvönen, Uppsala, Sweden Mobile Phone/SMS:+46 73 803 19 00
[jira] [Updated] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Grotzke updated SOLR-2583: - Attachment: FileFloatSource.java.patch The attached patch changes FileFloatSource to use a map of score by doc. Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor Attachments: FileFloatSource.java.patch External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc - score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: ant javadocs-test-framework failure
Hi, References to local fields must start with #. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ryan McKinley [mailto:ryan...@gmail.com] Sent: Thursday, June 09, 2011 2:09 AM To: dev@lucene.apache.org Subject: ant javadocs-test-framework failure I'm getting a failure on: ant javadocs-test-framework {@link RANDOM_MULTIPLIER} is failing -- i don't really get why... ryan@bicho~ $ java -version java version 1.6.0_25 Java(TM) SE Runtime Environment (build 1.6.0_25-b06) Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing) ryan@bicho~ $ ant -version Apache Ant version 1.7.1 compiled on June 27 2008 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Remove @version tags from JDocs
hey folks, in solr and some lucene classes we have @version tags with svn $Id stuff in there which we got rid of in lucene a while ago. I went through all classes and removed them. I just want to check with everybody if its ok to commit that. Note: I only changed javadocs all other usage of $Id etc still remains. simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3108) Land DocValues on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046451#comment-13046451 ] Michael McCandless commented on LUCENE-3108: I did another review here -- I think it's ready to land on trunk! Nice work Simon! Land DocValues on trunk --- Key: LUCENE-3108 URL: https://issues.apache.org/jira/browse/LUCENE-3108 Project: Lucene - Java Issue Type: Task Components: core/index, core/search, core/store Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch Its time to move another feature from branch to trunk. I want to start this process now while still a couple of issues remain on the branch. Currently I am down to a single nocommit (javadocs on DocValues.java) and a couple of testing TODOs (explicit multithreaded tests and unoptimized with deletions) but I think those are not worth separate issues so we can resolve them as we go. The already created issues (LUCENE-3075 and LUCENE-3074) should not block this process here IMO, we can fix them once we are on trunk. Here is a quick feature overview of what has been implemented: * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, Bytes (fixed / variable size each in sorted, straight and deref variations) * Integration into Flex-API, Codec provides a PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) * By-Default enabled in all codecs except of PreFlex * Follows other flex-API patterns like non-segment reader throw UOE forcing MultiPerDocValues if on DirReader etc. * Integration into IndexWriter, FieldInfos etc. * Random-testing enabled via RandomIW - injecting random DocValues into documents * Basic checks in CheckIndex (which runs after each test) * FieldComparator for int and float variants (Sorting, currently directly integrated into SortField, this might go into a separate DocValuesSortField eventually) * Extended TestSort for DocValues * RAM-Resident random access API plus on-disk DocValuesEnum (currently only sequential access) - Source.java / DocValuesEnum.java * Extensible Cache implementation for RAM-Resident DocValues (by-default loaded into RAM only once and freed once IR is closed) - SourceCache.java PS: Currently the RAM resident API is named Source (Source.java) which seems too generic. I think we should rename it into RamDocValues or something like that, suggestion welcome! Any comments, questions (rants :)) are very much appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Remove @version tags from JDocs
+1 - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Thursday, June 09, 2011 12:25 PM To: dev@lucene.apache.org Subject: Remove @version tags from JDocs hey folks, in solr and some lucene classes we have @version tags with svn $Id stuff in there which we got rid of in lucene a while ago. I went through all classes and removed them. I just want to check with everybody if its ok to commit that. Note: I only changed javadocs all other usage of $Id etc still remains. simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2935) Let Codec consume entire document
[ https://issues.apache.org/jira/browse/LUCENE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2935. - Resolution: Fixed the main infrastructure has been committed to the docvalues branch - moving out here Let Codec consume entire document - Key: LUCENE-2935 URL: https://issues.apache.org/jira/browse/LUCENE-2935 Project: Lucene - Java Issue Type: Improvement Components: core/codecs, core/index Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: CSF branch, 4.0 Currently the codec API is limited to consume Terms Postings upon a segment flush. To enable stored fields DocValues to make use of the Codec abstraction codecs should allow to pull a consumer ahead of flush time and consume all values from a document's field though a consumer API. An alternative to consuming the entire document would be extending FieldsConsumer to return a StoredValueConsumer / DocValuesConsumer like it is done in DocValues - Branch right now side by side to the TermsConsumer. Yet, extending this has proven to be very tricky and error prone for several reasons: * FieldsConsumer requires SegmentWriteState which might be different upon flush compared to when the document is consumed. SegmentWriteState must therefor be created twice 1. when the first docvalues field is indexed 2. when flushed. * FieldsConsumer are current pulled for each indexed field no matter if there are terms to be indexed or not. Yet, if we use something like DocValuesCodec which essentially wraps another codec and creates FieldConsumer on demand the wrapped codecs consumer might not be initialized even if the field is indexed. This causes problems once such a field is opened but missing the required files for that codec. I added some harsh logic to work around this which should be prevented. * SegmentCodecs are created for each SegmentWriteState which might yield wrong codec IDs depending on how fields numbers are assigned. We currently depend on the fact that all fields for a segment and therefore their codecs are known when SegmentCodecs are build. To enable consuming perDoc values in codecs we need to do that incrementally Codecs should instead provide a DocumentConsumer side by side with the FieldsConsumer created prior to flush. This is also a prerequisite for LUCENE-2621 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3075) DocValues should be optionally be stored in a PerCodec CFS file to prevent too many files in the index
[ https://issues.apache.org/jira/browse/LUCENE-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3075: Affects Version/s: (was: CSF branch) 4.0 Fix Version/s: (was: CSF branch) 4.0 update to 4.0 - fix once on trunk DocValues should be optionally be stored in a PerCodec CFS file to prevent too many files in the index -- Key: LUCENE-3075 URL: https://issues.apache.org/jira/browse/LUCENE-3075 Project: Lucene - Java Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Currently docvalues create one file per field to store the docvalues. Yet this could easily lead to too many open files so me might need to enable CFS per codec to keep the number of files reasonable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3074) SimpleTextCodec needs SimpleText DocValues impl
[ https://issues.apache.org/jira/browse/LUCENE-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3074: Affects Version/s: (was: CSF branch) 4.0 Fix Version/s: (was: CSF branch) 4.0 fix once on trunk SimpleTextCodec needs SimpleText DocValues impl --- Key: LUCENE-3074 URL: https://issues.apache.org/jira/browse/LUCENE-3074 Project: Lucene - Java Issue Type: Task Components: core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Michael McCandless Fix For: 4.0 currently SimpleTextCodec uses binary docValues we should move that to a simple text impl. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2186) First cut at column-stride fields (index values storage)
[ https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2186. - Resolution: Fixed currently landing on LUCENE-3108 First cut at column-stride fields (index values storage) Key: LUCENE-2186 URL: https://issues.apache.org/jira/browse/LUCENE-2186 Project: Lucene - Java Issue Type: New Feature Components: core/index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: CSF branch, 4.0 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, mem.py I created an initial basic impl for storing index values (ie column-stride value storage). This is still a work in progress... but the approach looks compelling. I'm posting my current status/patch here to get feedback/iterate, etc. The code is standalone now, and lives under new package oal.index.values (plus some util changes, refactorings) -- I have yet to integrate into Lucene so eg you can mark that a given Field's value should be stored into the index values, sorting will use these values instead of field cache, etc. It handles 3 types of values: * Six variants of byte[] per doc, all combinations of fixed vs variable length, and stored either straight (good for eg a title field), deref (good when many docs share the same value, but you won't do any sorting) or sorted. * Integers (variable bit precision used as necessary, ie this can store byte/short/int/long, and all precisions in between) * Floats (4 or 8 byte precision) String fields are stored as the UTF8 byte[]. This patch adds a BytesRef, which does the same thing as flex's TermRef (we should merge them). This patch also adds basic initial impl of PackedInts (LUCENE-1990); we can swap that out if/when we get a better impl. This storage is dense (like field cache), so it's appropriate when the field occurs in all/most docs. It's just like field cache, except the reading API is a get() method invocation, per document. Next step is to do basic integration with Lucene, and then compare sort performance of this vs field cache. For the sort by String value case, I think RAM usage GC load of this index values API should be much better than field caache, since it does not create object per document (instead shares big long[] and byte[] across all docs), and because the values are stored in RAM as their UTF8 bytes. There are abstract Writer/Reader classes. The current reader impls are entirely RAM resident (like field cache), but the API is (I think) agnostic, ie, one could make an MMAP impl instead. I think this is the first baby step towards LUCENE-1231. Ie, it cannot yet update values, and the reading API is fully random-access by docID (like field cache), not like a posting list, though I do think we should add an iterator() api (to return flex's DocsEnum) -- eg I think this would be a good way to track avg doc/field length for BM25/lnu.ltc scoring. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-1231) Column-stride fields (aka per-document Payloads)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-1231: --- Assignee: Simon Willnauer (was: Michael Busch) Column-stride fields (aka per-document Payloads) Key: LUCENE-1231 URL: https://issues.apache.org/jira/browse/LUCENE-1231 Project: Lucene - Java Issue Type: New Feature Components: core/index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 This new feature has been proposed and discussed here: http://markmail.org/search/?q=per-document+payloads#query:per-document%20payloads+page:1+mid:jq4g5myhlvidw3oc+state:results Currently it is possible in Lucene to store data as stored fields or as payloads. Stored fields provide good performance if you want to load all fields for one document, because this is an sequential I/O operation. If you however want to load the data from one field for a large number of documents, then stored fields perform quite badly, because lot's of I/O seeks might have to be performed. A better way to do this is using payloads. By creating a special posting list that has one posting with payload for each document you can simulate a column- stride field. The performance is significantly better compared to stored fields, however still not optimal. The reason is that for each document the freq value, which is in this particular case always 1, has to be decoded, also one position value, which is always 0, has to be loaded. As a solution we want to add real column-stride fields to Lucene. A possible format for the new data structure could look like this (CSD stands for column- stride data, once we decide for a final name for this feature we can change this): CSDList -- FixedLengthList | VariableLengthList, SkipList FixedLengthList -- Payload^SegSize VariableLengthList -- DocDelta, PayloadLength?, Payload Payload -- Byte^PayloadLength PayloadLength -- VInt SkipList -- see frq.file We distinguish here between the fixed length and the variable length cases. To allow flexibility, Lucene could automatically pick the right data structure. This could work like this: When the DocumentsWriter writes a segment it checks whether all values of a field have the same length. If yes, it stores them as FixedLengthList, if not, then as VariableLengthList. When the SegmentMerger merges two or more segments it checks if all segments have a FixedLengthList with the same length for a column-stride field. If not, it writes a VariableLengthList to the new segment. Once this feature is implemented, we should think about making the column- stride fields updateable, similar to the norms. This will be a very powerful feature that can for example be used for low-latency tagging of documents. Other use cases: - replace norms - allow to store boost values separately from norms - as input for the FieldCache, thus providing significantly improved loading performance (see LUCENE-831) Things that need to be done here: - decide for a name for this feature :) - I think column-stride fields was liked better than per-document payloads - Design an API for this feature. We should keep in mind here that these fields are supposed to be updateable. - Define datastructures. I would like to get this feature into 2.4. Feedback about the open questions is very welcome so that we can finalize the design soon and start implementing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1231) Column-stride fields (aka per-document Payloads)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-1231. - Resolution: Duplicate this has been implemented in LUCENE-3108, LUCENE-2935, LUCENE-2168 and LUCENE-1231 moving out Column-stride fields (aka per-document Payloads) Key: LUCENE-1231 URL: https://issues.apache.org/jira/browse/LUCENE-1231 Project: Lucene - Java Issue Type: New Feature Components: core/index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 This new feature has been proposed and discussed here: http://markmail.org/search/?q=per-document+payloads#query:per-document%20payloads+page:1+mid:jq4g5myhlvidw3oc+state:results Currently it is possible in Lucene to store data as stored fields or as payloads. Stored fields provide good performance if you want to load all fields for one document, because this is an sequential I/O operation. If you however want to load the data from one field for a large number of documents, then stored fields perform quite badly, because lot's of I/O seeks might have to be performed. A better way to do this is using payloads. By creating a special posting list that has one posting with payload for each document you can simulate a column- stride field. The performance is significantly better compared to stored fields, however still not optimal. The reason is that for each document the freq value, which is in this particular case always 1, has to be decoded, also one position value, which is always 0, has to be loaded. As a solution we want to add real column-stride fields to Lucene. A possible format for the new data structure could look like this (CSD stands for column- stride data, once we decide for a final name for this feature we can change this): CSDList -- FixedLengthList | VariableLengthList, SkipList FixedLengthList -- Payload^SegSize VariableLengthList -- DocDelta, PayloadLength?, Payload Payload -- Byte^PayloadLength PayloadLength -- VInt SkipList -- see frq.file We distinguish here between the fixed length and the variable length cases. To allow flexibility, Lucene could automatically pick the right data structure. This could work like this: When the DocumentsWriter writes a segment it checks whether all values of a field have the same length. If yes, it stores them as FixedLengthList, if not, then as VariableLengthList. When the SegmentMerger merges two or more segments it checks if all segments have a FixedLengthList with the same length for a column-stride field. If not, it writes a VariableLengthList to the new segment. Once this feature is implemented, we should think about making the column- stride fields updateable, similar to the norms. This will be a very powerful feature that can for example be used for low-latency tagging of documents. Other use cases: - replace norms - allow to store boost values separately from norms - as input for the FieldCache, thus providing significantly improved loading performance (see LUCENE-831) Things that need to be done here: - decide for a name for this feature :) - I think column-stride fields was liked better than per-document payloads - Design an API for this feature. We should keep in mind here that these fields are supposed to be updateable. - Define datastructures. I would like to get this feature into 2.4. Feedback about the open questions is very welcome so that we can finalize the design soon and start implementing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2463) Using an evaluator outside the scope of an entity results in a null context
Jeffrey, can you supply some more information like data-config.xml, stacktrace and what your delta-query looks like? [ https://issues.apache.org/jira/browse/SOLR-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046346#comment-13046346 ] Jeffrey Chang commented on SOLR-2463: - I just tried delta-imports on 3.2, this is still unresolved. I also tried applying SOLR-2186 patch but no luck. -- mit freundlichem Gruß, Frank Wesemann Fotofinder GmbH USt-IdNr. DE812854514 Software EntwicklungWeb: http://www.fotofinder.com/ Potsdamer Str. 96 Tel: +49 30 25 79 28 90 10785 BerlinFax: +49 30 25 79 28 999 Sitz: Berlin Amtsgericht Berlin Charlottenburg (HRB 73099) Geschäftsführer: Ali Paczensky - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type
[ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046467#comment-13046467 ] Robert Muir commented on LUCENE-3186: - do we really need to do this? I guess also looking at LUCENE-3187, I think I'm against this trend. Shall we put analyzer classnames in there too? If we are going to put docvalues type and precision step, well then i want the stopwords file in the fnx file too! At some point, if a user is going to shoot themselves in the foot, we simply cannot stop them, and I don't think its our job to. DocValues type should be recored in FNX file to early fail if user specifies incompatible type -- Key: LUCENE-3186 URL: https://issues.apache.org/jira/browse/LUCENE-3186 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers. I marked this 4.0 since it should not block the landing on trunk -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2372) Upgrade Solr to Tika 0.9
[ https://issues.apache.org/jira/browse/SOLR-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2372: -- Component/s: contrib - Solr Cell (Tika extraction) Priority: Major (was: Trivial) Fix Version/s: 3.3 Marking for 3.3 and bumping priority to major due to the good cost/benefit ratio, especially for PDF parsing. I'd love to contribute but I think this kind of change cannot be done with a patch. Upgrade Solr to Tika 0.9 Key: SOLR-2372 URL: https://issues.apache.org/jira/browse/SOLR-2372 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Grant Ingersoll Fix For: 3.3 as the title says -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3108) Land DocValues on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3108: Attachment: LUCENE-3108_CHANGES.patch here is a changes entry for docvalues - comments welcome Land DocValues on trunk --- Key: LUCENE-3108 URL: https://issues.apache.org/jira/browse/LUCENE-3108 Project: Lucene - Java Issue Type: Task Components: core/index, core/search, core/store Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch Its time to move another feature from branch to trunk. I want to start this process now while still a couple of issues remain on the branch. Currently I am down to a single nocommit (javadocs on DocValues.java) and a couple of testing TODOs (explicit multithreaded tests and unoptimized with deletions) but I think those are not worth separate issues so we can resolve them as we go. The already created issues (LUCENE-3075 and LUCENE-3074) should not block this process here IMO, we can fix them once we are on trunk. Here is a quick feature overview of what has been implemented: * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, Bytes (fixed / variable size each in sorted, straight and deref variations) * Integration into Flex-API, Codec provides a PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) * By-Default enabled in all codecs except of PreFlex * Follows other flex-API patterns like non-segment reader throw UOE forcing MultiPerDocValues if on DirReader etc. * Integration into IndexWriter, FieldInfos etc. * Random-testing enabled via RandomIW - injecting random DocValues into documents * Basic checks in CheckIndex (which runs after each test) * FieldComparator for int and float variants (Sorting, currently directly integrated into SortField, this might go into a separate DocValuesSortField eventually) * Extended TestSort for DocValues * RAM-Resident random access API plus on-disk DocValuesEnum (currently only sequential access) - Source.java / DocValuesEnum.java * Extensible Cache implementation for RAM-Resident DocValues (by-default loaded into RAM only once and freed once IR is closed) - SourceCache.java PS: Currently the RAM resident API is named Source (Source.java) which seems too generic. I think we should rename it into RamDocValues or something like that, suggestion welcome! Any comments, questions (rants :)) are very much appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3108) Land DocValues on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046473#comment-13046473 ] Uwe Schindler commented on LUCENE-3108: --- One small issue: There seems to be a merge missing in file TestIndexSplitter, the changes in there are unrelated, so this reverts a commit on trunk for improving tests. The problem with the README.txt is already fixed. ...still digging Land DocValues on trunk --- Key: LUCENE-3108 URL: https://issues.apache.org/jira/browse/LUCENE-3108 Project: Lucene - Java Issue Type: Task Components: core/index, core/search, core/store Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch Its time to move another feature from branch to trunk. I want to start this process now while still a couple of issues remain on the branch. Currently I am down to a single nocommit (javadocs on DocValues.java) and a couple of testing TODOs (explicit multithreaded tests and unoptimized with deletions) but I think those are not worth separate issues so we can resolve them as we go. The already created issues (LUCENE-3075 and LUCENE-3074) should not block this process here IMO, we can fix them once we are on trunk. Here is a quick feature overview of what has been implemented: * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, Bytes (fixed / variable size each in sorted, straight and deref variations) * Integration into Flex-API, Codec provides a PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) * By-Default enabled in all codecs except of PreFlex * Follows other flex-API patterns like non-segment reader throw UOE forcing MultiPerDocValues if on DirReader etc. * Integration into IndexWriter, FieldInfos etc. * Random-testing enabled via RandomIW - injecting random DocValues into documents * Basic checks in CheckIndex (which runs after each test) * FieldComparator for int and float variants (Sorting, currently directly integrated into SortField, this might go into a separate DocValuesSortField eventually) * Extended TestSort for DocValues * RAM-Resident random access API plus on-disk DocValuesEnum (currently only sequential access) - Source.java / DocValuesEnum.java * Extensible Cache implementation for RAM-Resident DocValues (by-default loaded into RAM only once and freed once IR is closed) - SourceCache.java PS: Currently the RAM resident API is named Source (Source.java) which seems too generic. I think we should rename it into RamDocValues or something like that, suggestion welcome! Any comments, questions (rants :)) are very much appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3108) Land DocValues on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046474#comment-13046474 ] Simon Willnauer commented on LUCENE-3108: - bq. There seems to be a merge missing in file TestIndexSplitter, the changes in there are unrelated, so this reverts a commit on trunk for improving tests. fixed revision 1133794 thanks uwe! Land DocValues on trunk --- Key: LUCENE-3108 URL: https://issues.apache.org/jira/browse/LUCENE-3108 Project: Lucene - Java Issue Type: Task Components: core/index, core/search, core/store Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch Its time to move another feature from branch to trunk. I want to start this process now while still a couple of issues remain on the branch. Currently I am down to a single nocommit (javadocs on DocValues.java) and a couple of testing TODOs (explicit multithreaded tests and unoptimized with deletions) but I think those are not worth separate issues so we can resolve them as we go. The already created issues (LUCENE-3075 and LUCENE-3074) should not block this process here IMO, we can fix them once we are on trunk. Here is a quick feature overview of what has been implemented: * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, Bytes (fixed / variable size each in sorted, straight and deref variations) * Integration into Flex-API, Codec provides a PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) * By-Default enabled in all codecs except of PreFlex * Follows other flex-API patterns like non-segment reader throw UOE forcing MultiPerDocValues if on DirReader etc. * Integration into IndexWriter, FieldInfos etc. * Random-testing enabled via RandomIW - injecting random DocValues into documents * Basic checks in CheckIndex (which runs after each test) * FieldComparator for int and float variants (Sorting, currently directly integrated into SortField, this might go into a separate DocValuesSortField eventually) * Extended TestSort for DocValues * RAM-Resident random access API plus on-disk DocValuesEnum (currently only sequential access) - Source.java / DocValuesEnum.java * Extensible Cache implementation for RAM-Resident DocValues (by-default loaded into RAM only once and freed once IR is closed) - SourceCache.java PS: Currently the RAM resident API is named Source (Source.java) which seems too generic. I think we should rename it into RamDocValues or something like that, suggestion welcome! Any comments, questions (rants :)) are very much appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #147: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/147/ No tests ran. Build Log (for compile errors): [...truncated 8340 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Remove @version tags from JDocs
+1 On Thu, Jun 9, 2011 at 6:24 AM, Simon Willnauer simon.willna...@googlemail.com wrote: hey folks, in solr and some lucene classes we have @version tags with svn $Id stuff in there which we got rid of in lucene a while ago. I went through all classes and removed them. I just want to check with everybody if its ok to commit that. Note: I only changed javadocs all other usage of $Id etc still remains. simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type
[ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046479#comment-13046479 ] Uwe Schindler commented on LUCENE-3186: --- Hi Robert, I am also not really happy with this trend. I just opened LUCENE-3187 to start a discussion. In my opinion we should improve documentation instead. DocValues type should be recored in FNX file to early fail if user specifies incompatible type -- Key: LUCENE-3186 URL: https://issues.apache.org/jira/browse/LUCENE-3186 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers. I marked this 4.0 since it should not block the landing on trunk -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3187) Store NumericField precisionStep in fnx file
[ https://issues.apache.org/jira/browse/LUCENE-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046480#comment-13046480 ] Uwe Schindler commented on LUCENE-3187: --- Robert commented on LUCENE-3186: {quote} do we really need to do this? I guess also looking at LUCENE-3187, I think I'm against this trend. Shall we put analyzer classnames in there too? If we are going to put docvalues type and precision step, well then i want the stopwords file in the fnx file too! At some point, if a user is going to shoot themselves in the foot, we simply cannot stop them, and I don't think its our job to. {quote} I am also not really happy with this trend. I just opened LUCENE-3187 to start a discussion. In my opinion we should improve documentation instead. Store NumericField precisionStep in fnx file Key: LUCENE-3187 URL: https://issues.apache.org/jira/browse/LUCENE-3187 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 2.9, 3.0, 3.1, 3.2 Reporter: Uwe Schindler This is a similar problem like LUCENE-3186: The following question was sent to user list: [http://mail-archives.apache.org/mod_mbox/lucene-java-user/201106.mbox/%3c614c529d389a5944b351f7dfb7594f24012aa...@uksrpblkexb01.detica.com%3E] The main problem is that you have to pass the precision step and must knwo the field type of numeric fields before doing a query, else you get wrong results. We can maybe store the type and precision step in fnx file (like we do for stored numeric fields in FieldsWriter). I am not sure whats the best way to do it (without too much code specialization), but it seems a good idea. On the other hand, we don't store references to the Analyzer in the fnx file, so why for numeric field (it's just like an analyzer - if you change it, results are wrong)? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type
[ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046481#comment-13046481 ] Simon Willnauer commented on LUCENE-3186: - I think for this issue we can compute that info at IW open time. we can simply run through the FIs and prepopulate the info. I think this is better than redundantly store this info. DocValues type should be recored in FNX file to early fail if user specifies incompatible type -- Key: LUCENE-3186 URL: https://issues.apache.org/jira/browse/LUCENE-3186 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers. I marked this 4.0 since it should not block the landing on trunk -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2955) Add utitily class to manage NRT reopening
[ https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046496#comment-13046496 ] Simon Willnauer commented on LUCENE-2955: - Mike, nice work so far :) I have to admit that I really don't like the reopen thread. I think reopen in the background should be abstracted and the Reopen thread should not be part of the core manager. By default I think we should consult a ReopenStrategy on change and hijack indexing threads to reopen the reader. we can still sychronized the reopeing with a lock.tryLock() and by default go with a timed reopen policy. Thoughts? simon Add utitily class to manage NRT reopening - Key: LUCENE-2955 URL: https://issues.apache.org/jira/browse/LUCENE-2955 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.3 Attachments: LUCENE-2955.patch, LUCENE-2955.patch I created a simple class, NRTManager, that tries to abstract away some of the reopen logic when using NRT readers. You give it your IW, tell it min and max nanoseconds staleness you can tolerate, and it privately runs a reopen thread to periodically reopen the searcher. It subsumes the SearcherManager from LIA2. Besides running the reopen thread, it also adds the notion of a generation containing changes you've made. So eg it has addDocument, returning a long. You can then take that long value and pass it back to the getSearcher method and getSearcher will return a searcher that reflects the changes made in that generation. This gives your app the freedom to force immediate consistency (ie wait for the reopen) only for those searches that require it, like a verifier that adds a doc and then immediately searches for it, but also use eventual consistency for other searches. I want to also add support for the new applyDeletions option when pulling an NRT reader. Also, this is very new and I'm sure buggy -- the concurrency is either wrong over overly-locking. But it's a start... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3108) Land DocValues on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3108: Attachment: LUCENE-3108.patch here is the latest diff for docvalues I will now reintegrate the branch and post diffs later. Land DocValues on trunk --- Key: LUCENE-3108 URL: https://issues.apache.org/jira/browse/LUCENE-3108 Project: Lucene - Java Issue Type: Task Components: core/index, core/search, core/store Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch Its time to move another feature from branch to trunk. I want to start this process now while still a couple of issues remain on the branch. Currently I am down to a single nocommit (javadocs on DocValues.java) and a couple of testing TODOs (explicit multithreaded tests and unoptimized with deletions) but I think those are not worth separate issues so we can resolve them as we go. The already created issues (LUCENE-3075 and LUCENE-3074) should not block this process here IMO, we can fix them once we are on trunk. Here is a quick feature overview of what has been implemented: * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, Bytes (fixed / variable size each in sorted, straight and deref variations) * Integration into Flex-API, Codec provides a PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) * By-Default enabled in all codecs except of PreFlex * Follows other flex-API patterns like non-segment reader throw UOE forcing MultiPerDocValues if on DirReader etc. * Integration into IndexWriter, FieldInfos etc. * Random-testing enabled via RandomIW - injecting random DocValues into documents * Basic checks in CheckIndex (which runs after each test) * FieldComparator for int and float variants (Sorting, currently directly integrated into SortField, this might go into a separate DocValuesSortField eventually) * Extended TestSort for DocValues * RAM-Resident random access API plus on-disk DocValuesEnum (currently only sequential access) - Source.java / DocValuesEnum.java * Extensible Cache implementation for RAM-Resident DocValues (by-default loaded into RAM only once and freed once IR is closed) - SourceCache.java PS: Currently the RAM resident API is named Source (Source.java) which seems too generic. I think we should rename it into RamDocValues or something like that, suggestion welcome! Any comments, questions (rants :)) are very much appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3183: Attachment: LUCENE-3183_test.patch i tried to debug this a little last night... its some off-by-one in reset() (this shoves a negative ord into the terms dictionary cache, which jacks things up later) test passes on 3.x, also generated 3.x index and checkindex'd it with trunk to verify that the problem isn't in Preflex-RW but is actually in PreFlex-R... but I didn't manage to come up with any non-hacky solution for the off-by-one... TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Attachments: LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at
[jira] [Commented] (SOLR-2580) Create a new Search Component to alter queries based on business rules.
[ https://issues.apache.org/jira/browse/SOLR-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046514#comment-13046514 ] Tomás Fernández Löbbe commented on SOLR-2580: - Basically, it's just another component designed to modify the relevancy of documents, as the QueryElevationComponent is. Of course, this could be implemented by each site on the application layer but I think it would be very helpful to write one reusable component, then everybody can use the same, they don't reinvent the wheel and they can invest the effort in improving it. Should it be included in Solr? Personally I think this is something that can be useful to many people and it will add value to Solr. At the end, the community and the committers will decide if they think this is something worthily or not. JBoss AS is the application server, but JBoss is also an organization that runs many projects (like drools). You don't need to use any application server in particular to make Drools work. It's a library, not an application itself. Create a new Search Component to alter queries based on business rules. Key: SOLR-2580 URL: https://issues.apache.org/jira/browse/SOLR-2580 Project: Solr Issue Type: New Feature Reporter: Tomás Fernández Löbbe The goal is to be able to adjust the relevance of documents based on user defined business rules. For example, in a e-commerce site, when the user chooses the shoes category, we may be interested in boosting products from a certain brand. This can be expressed as a rule in the following way: rule Boost Adidas products when searching shoes when $qt : QueryTool() TermQuery(term.field==category, term.text==shoes) then $qt.boost({!lucene}brand:adidas); end The QueryTool object should be used to alter the main query in a easy way. Even more human-like rules can be written: rule Boost Adidas products when searching shoes when Query has term shoes in field product then Add boost query {!lucene}brand:adidas end These rules are written in a text file in the config directory and can be modified at runtime. Rules will be managed using JBoss Drools: http://www.jboss.org/drools/drools-expert.html On a first stage, it will allow to add boost queries or change sorting fields based on the user query, but it could be extended to allow more options. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2955) Add utitily class to manage NRT reopening
[ https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046518#comment-13046518 ] Chris Male commented on LUCENE-2955: I agree with Simon. I think providing a ReopenStrategy abstraction will be helpful. Add utitily class to manage NRT reopening - Key: LUCENE-2955 URL: https://issues.apache.org/jira/browse/LUCENE-2955 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.3 Attachments: LUCENE-2955.patch, LUCENE-2955.patch I created a simple class, NRTManager, that tries to abstract away some of the reopen logic when using NRT readers. You give it your IW, tell it min and max nanoseconds staleness you can tolerate, and it privately runs a reopen thread to periodically reopen the searcher. It subsumes the SearcherManager from LIA2. Besides running the reopen thread, it also adds the notion of a generation containing changes you've made. So eg it has addDocument, returning a long. You can then take that long value and pass it back to the getSearcher method and getSearcher will return a searcher that reflects the changes made in that generation. This gives your app the freedom to force immediate consistency (ie wait for the reopen) only for those searches that require it, like a verifier that adds a doc and then immediately searches for it, but also use eventual consistency for other searches. I want to also add support for the new applyDeletions option when pulling an NRT reader. Also, this is very new and I'm sure buggy -- the concurrency is either wrong over overly-locking. But it's a start... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3111) TestFSTs.testRandomWords failure
[ https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046522#comment-13046522 ] Robert Muir commented on LUCENE-3111: - {quote} I can reproduce - Gabriele's test class's setUp() method calls super.setUp(), but when I run the test the error message about needing to call super.setUp() is emitted, and the test fails. I don't know how to diagnose this problem, though. {quote} You must use junit 4.7 (not 4.8). In junit 4.8 TestWatchMan.starting() is fired before the @Befores, but not in 4.7 (This behavior annoyed me in 4.7 by the way). I definitely don't mind opening a new issue to switch to 4.8 as a minimum requirement. TestFSTs.testRandomWords failure Key: LUCENE-3111 URL: https://issues.apache.org/jira/browse/LUCENE-3111 Project: Lucene - Java Issue Type: Bug Reporter: selckin Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LUCENE-3111.patch Was running some while(1) tests on the docvalues branch (r1103705) and the following test failed: {code} [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs [junit] Testcase: testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED [junit] expected:771 but was:TwoLongs:771,771 [junit] junit.framework.AssertionFailedError: expected:771 but was:TwoLongs:771,771 [junit] at org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540) [junit] at org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496) [junit] at org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359) [junit] at org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319) [junit] at org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940) [junit] at org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) [junit] [junit] [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec [junit] [junit] - Standard Error - [junit] NOTE: Ignoring nightly-only test method 'testBigSet' [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0 [junit] NOTE: test params are: codec=PreFlex, locale=ar, timezone=America/Blanc-Sablon [junit] NOTE: all tests run in this JVM: [junit] [TestToken, TestCodecs, TestIndexReaderReopen, TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, TestUnicodeUtil, TestFSTs] [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 (64-bit)/cpus=8,threads=1,free=137329960,total=208207872 [junit] - --- [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED {code} I am not able to reproduce -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3108) Land DocValues on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3108: Attachment: LUCENE-3108.patch Patch that reflects the last changes to sync with trunk after I ran svn merge -reintegrate The reintegrated branch looks good, no unchanged additions etc. I think we are ready to land this on trunk... I will wait a day or two if somebody has objections. here is my +1 to commit Land DocValues on trunk --- Key: LUCENE-3108 URL: https://issues.apache.org/jira/browse/LUCENE-3108 Project: Lucene - Java Issue Type: Task Components: core/index, core/search, core/store Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch Its time to move another feature from branch to trunk. I want to start this process now while still a couple of issues remain on the branch. Currently I am down to a single nocommit (javadocs on DocValues.java) and a couple of testing TODOs (explicit multithreaded tests and unoptimized with deletions) but I think those are not worth separate issues so we can resolve them as we go. The already created issues (LUCENE-3075 and LUCENE-3074) should not block this process here IMO, we can fix them once we are on trunk. Here is a quick feature overview of what has been implemented: * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, Bytes (fixed / variable size each in sorted, straight and deref variations) * Integration into Flex-API, Codec provides a PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) * By-Default enabled in all codecs except of PreFlex * Follows other flex-API patterns like non-segment reader throw UOE forcing MultiPerDocValues if on DirReader etc. * Integration into IndexWriter, FieldInfos etc. * Random-testing enabled via RandomIW - injecting random DocValues into documents * Basic checks in CheckIndex (which runs after each test) * FieldComparator for int and float variants (Sorting, currently directly integrated into SortField, this might go into a separate DocValuesSortField eventually) * Extended TestSort for DocValues * RAM-Resident random access API plus on-disk DocValuesEnum (currently only sequential access) - Source.java / DocValuesEnum.java * Extensible Cache implementation for RAM-Resident DocValues (by-default loaded into RAM only once and freed once IR is closed) - SourceCache.java PS: Currently the RAM resident API is named Source (Source.java) which seems too generic. I think we should rename it into RamDocValues or something like that, suggestion welcome! Any comments, questions (rants :)) are very much appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2582) UIMAUpdateRequestProcessor error handling with small texts
[ https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046532#comment-13046532 ] Koji Sekiguchi commented on SOLR-2582: -- Duplicate of SOLR-2579 ? UIMAUpdateRequestProcessor error handling with small texts -- Key: SOLR-2582 URL: https://issues.apache.org/jira/browse/SOLR-2582 Project: Solr Issue Type: Bug Affects Versions: 3.2 Reporter: Tommaso Teofili Fix For: 3.3 In UIMAUpdateRequestProcessor the catch block in processAdd() method can have a StringIndexOutOfBoundsException while composing the error message if the logging field is not set and the text being processed is shorter than 100 chars (...append(text.substring(0, 100))...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2582) UIMAUpdateRequestProcessor error handling with small texts
[ https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046534#comment-13046534 ] Elmer Garduno commented on SOLR-2582: - Sorry it seems to me as a duplicate but I see its a different problem. I've removed the link. UIMAUpdateRequestProcessor error handling with small texts -- Key: SOLR-2582 URL: https://issues.apache.org/jira/browse/SOLR-2582 Project: Solr Issue Type: Bug Affects Versions: 3.2 Reporter: Tommaso Teofili Fix For: 3.3 In UIMAUpdateRequestProcessor the catch block in processAdd() method can have a StringIndexOutOfBoundsException while composing the error message if the logging field is not set and the text being processed is shorter than 100 chars (...append(text.substring(0, 100))...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3108) Land DocValues on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046537#comment-13046537 ] Ryan McKinley commented on LUCENE-3108: --- +1 This looks great. To avoid more svn work, I think committing soon is better then later. Land DocValues on trunk --- Key: LUCENE-3108 URL: https://issues.apache.org/jira/browse/LUCENE-3108 Project: Lucene - Java Issue Type: Task Components: core/index, core/search, core/store Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch Its time to move another feature from branch to trunk. I want to start this process now while still a couple of issues remain on the branch. Currently I am down to a single nocommit (javadocs on DocValues.java) and a couple of testing TODOs (explicit multithreaded tests and unoptimized with deletions) but I think those are not worth separate issues so we can resolve them as we go. The already created issues (LUCENE-3075 and LUCENE-3074) should not block this process here IMO, we can fix them once we are on trunk. Here is a quick feature overview of what has been implemented: * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, Bytes (fixed / variable size each in sorted, straight and deref variations) * Integration into Flex-API, Codec provides a PerDocConsumer-DocValuesConsumer (write) / PerDocValues-DocValues (read) * By-Default enabled in all codecs except of PreFlex * Follows other flex-API patterns like non-segment reader throw UOE forcing MultiPerDocValues if on DirReader etc. * Integration into IndexWriter, FieldInfos etc. * Random-testing enabled via RandomIW - injecting random DocValues into documents * Basic checks in CheckIndex (which runs after each test) * FieldComparator for int and float variants (Sorting, currently directly integrated into SortField, this might go into a separate DocValuesSortField eventually) * Extended TestSort for DocValues * RAM-Resident random access API plus on-disk DocValuesEnum (currently only sequential access) - Source.java / DocValuesEnum.java * Extensible Cache implementation for RAM-Resident DocValues (by-default loaded into RAM only once and freed once IR is closed) - SourceCache.java PS: Currently the RAM resident API is named Source (Source.java) which seems too generic. I think we should rename it into RamDocValues or something like that, suggestion welcome! Any comments, questions (rants :)) are very much appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2582) UIMAUpdateRequestProcessor error handling with small texts
[ https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046544#comment-13046544 ] Tommaso Teofili commented on SOLR-2582: --- I think they're related but the approach proposed here is slightly different since considers the uniquekey instead of the text analyzed as the alternative to the logField. Maybe the best solution is applying the patch in SOLR-2579 and then make the error message more useful with other debugging informations. UIMAUpdateRequestProcessor error handling with small texts -- Key: SOLR-2582 URL: https://issues.apache.org/jira/browse/SOLR-2582 Project: Solr Issue Type: Bug Affects Versions: 3.2 Reporter: Tommaso Teofili Fix For: 3.3 In UIMAUpdateRequestProcessor the catch block in processAdd() method can have a StringIndexOutOfBoundsException while composing the error message if the logging field is not set and the text being processed is shorter than 100 chars (...append(text.substring(0, 100))...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1804) Upgrade Carrot2 to 3.2.0
[ https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046545#comment-13046545 ] David Smiley commented on SOLR-1804: Good point Rob. If any use of Guava in a patch to Solr core is going to get reverted, then we might as well recognize that now and move Guava from Solr's lib to clustering's lib directory. Upgrade Carrot2 to 3.2.0 Key: SOLR-1804 URL: https://issues.apache.org/jira/browse/SOLR-1804 Project: Solr Issue Type: Improvement Components: contrib - Clustering Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 3.1, 4.0 Attachments: SOLR-1804-carrot2-3.4.0-dev-trunk.patch, SOLR-1804-carrot2-3.4.0-dev.patch, SOLR-1804-carrot2-3.4.0-libs.zip, SOLR-1804.patch, carrot2-core-3.4.0-jdk1.5.jar http://project.carrot2.org/release-3.2.0-notes.html Carrot2 is now LGPL free, which means we should be able to bundle the binary! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046564#comment-13046564 ] Yonik Seeley commented on SOLR-2583: Yeah, this will help for sparse fields, but hurt quite a bit for non-sparse ones. Seems like we should make it an option (sparse=true/false on the fieldType definition)? Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor Attachments: FileFloatSource.java.patch External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc - score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #144: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/144/ No tests ran. Build Log (for compile errors): [...truncated 7478 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3183: --- Attachment: LUCENE-3183.patch Patch. Turns out this is a long standing corner-case bug... the problem only happens if you seek to the empty term (field= and text=), and you use termsIndexInterval=1. TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Attachments: LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at
[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046597#comment-13046597 ] Robert Muir commented on LUCENE-3183: - nice, is there an alternative to if per-scan()? like, my hack (not sure if its correct) was to never add -1 to terms cache... so this would affect less queries (e.g. rangequeries and MTQs) since they bypass the cache anyway? TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Attachments: LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at
[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type
[ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046602#comment-13046602 ] Michael McCandless commented on LUCENE-3186: I think there are a few separate questions here... Today, on doc values branch, if you mix up your doc values, ie a field foo is at first indexed as a FLOAT_32 and then later you change your mind and later docs are index field foo as BYTES_FIXED_STRAIGHT, then this is bad news right now because everything will index fine, you can close your IW, etc., but at some later time merges will hit unrecoverable exceptions. You'll have no choice but to fully rebuild the index, which is rather awful. However, this is true even for cases you would expect to work, eg say foo was BYTES_FIXED_STRAIGHT but then later you decided you will want to sort on this field and so you use BYTES_FIXED_SORTED. (Simon: this also results in exception I think...?). Ideally we should do the right thing here and upgrade the BYTES_FIXED_STRAIGHT to BYTES_FIXED_SORTED (I think) -- Simon is there an issue open for this? So, I think the first question here is: which cases should be merged properly and which should be considered an error? Probably we have to work out the full matrix... Then the second question is, for the error cases (if any!), can/should we detect this up front, as you're indexing? Then third question is, if we want to detect up front, do we do that w/ fnx file or do we do that on init of IW (= no index change). DocValues type should be recored in FNX file to early fail if user specifies incompatible type -- Key: LUCENE-3186 URL: https://issues.apache.org/jira/browse/LUCENE-3186 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers. I marked this 4.0 since it should not block the landing on trunk -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type
[ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046604#comment-13046604 ] Robert Muir commented on LUCENE-3186: - {quote} So, I think the first question here is: which cases should be merged properly and which should be considered an error? Probably we have to work out the full matrix... {quote} this is all implementation details of docvalues, that it must deal with during merging. I think it should work out the LCD and merge to that. This is no different than if i have a field with all 8 character terms and then i add a 10-character term, sure my impl/codec's encoding could internally rely upon the the fact all terms are 8 chars, but it must transparently change its encoding to then support both 8 and 10 character terms and not throw an error. If you mix up your doc values with ints and floats and bytes, isnt the least common denominator always bytes? (just encode the int as 4 bytes or whatever). So in other words, i think its up to docvalues to change its encoding to support the LCD, which might mean downgrading ints to bytes or whatever, my only opinion is that it should never 'create' data (this was my issue with fake norms, lets not do that). DocValues type should be recored in FNX file to early fail if user specifies incompatible type -- Key: LUCENE-3186 URL: https://issues.apache.org/jira/browse/LUCENE-3186 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers. I marked this 4.0 since it should not block the landing on trunk -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046609#comment-13046609 ] Michael McCandless commented on LUCENE-3183: bq. nice, is there an alternative to if per-scan()? I think you're idea should work; the bug is really in STE.scanTo, but, since we only call this method in 2 places, and these classes are package private in 3.x, and I think it's unlikely apps will directly use STE from PreFlex codec on trunk, I think we can work around it in these places. You're right this saves an if in many cases... I'll put comments explaining it. TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Attachments: LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at
[jira] [Updated] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3183: Attachment: LUCENE-3183.patch here's my hack patch TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) [junit] at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) [junit]
[jira] [Updated] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3183: --- Attachment: LUCENE-3183.patch Patch using Robert's idea... I think it's ready to commit. TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) [junit] at
[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046618#comment-13046618 ] Robert Muir commented on LUCENE-3183: - +1, i think the comments are definitely necessary... this code is tricky :) TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) [junit] at
[jira] [Resolved] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3183. Resolution: Fixed Assignee: Michael McCandless TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Assignee: Michael McCandless Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) [junit] at
[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046622#comment-13046622 ] Michael McCandless commented on LUCENE-3183: Thanks selckin! Keep feeding that awesome random-number-generator you've got over there!! TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Assignee: Michael McCandless Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at
[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046627#comment-13046627 ] Robert Muir commented on LUCENE-3183: - I agree, i guestimated (running -Dtests.iter=1 and seeing 5 fails) the chance of finding this seed is like 1-in-2000! TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Assignee: Michael McCandless Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit]
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046635#comment-13046635 ] Michael McCandless commented on LUCENE-2793: bq. It seems that we don't need to provide IOContext to FieldInfos and SegmentInfo since we are reading them into memory anyway. I think you can just use a default context here without changing the constructors. Same is true for SegmentInfo I think we should pass down readOnce=true for these cases? EG some kind of caching dir (or something) would know not to bother caching such files... Same for del docs, terms index, doc values (well, sometimes), etc. bq. it seems that we should communicate the IOContext to the codec somehow. I suggest we put IOContext to SegmentWriteState and SegmentReadState that way we don't need to change the Codec interface and clutter it with internals. This would also fix mikes comment for FieldsConsumer etc. +1 that's great. bq. I really don't like OneMerge I think we should add an abstract class (maybe MergeInfo) that exposes the estimatedMergeBytes, totalDocCount for now. If we can't include OneMerge, and I agree it'd be nice not to, I think we should try hard to pull stuff out of OneMerge that may be of interest to a Dir impl? Maybe: * estimatedTotalSegmentSizeBytes * docCount * optimize/expungeDeletes * isExternal (so Dir can know if this is addIndexes vs normal merging) bq. Regarding the IOContext class I think we should design for what we have right now and since SegementInfo is not used anywhere (as far as I can see) we should add it once we need it. OneMerge should not go in there but rather the interface / abstract class I talked about above. I agree, let's wait until we have a need. In fact... SegmentInfo for flush won't work: we go and open all files for flushing, write to them, close them, and only then do we make the SegmentInfo. So it seems like we should also have some abtracted stuff about the to-be-flushed segment? Maybe for starters the estimatedSegmentSizeBytes? EG, NRTCachingDir could use this to decide whether to cache the new segment (today it fragile-ly relies on the app to open new NRT reader frequently enough). Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046641#comment-13046641 ] Michael McCandless commented on LUCENE-3179: I think we should just commit this? It's a useful API. LUCENE-3171 (alternative nested docs impl w/ single pass collector) also could use this. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2955) Add utitily class to manage NRT reopening
[ https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046644#comment-13046644 ] Jason Rutherglen commented on LUCENE-2955: -- Perhaps we can merge this functionality with SOLR-2565 and/or SOLR-2566, such that Solr utilizes it for reader opening. However why would this issue use a background thread and Solr performs a max time reopen? Add utitily class to manage NRT reopening - Key: LUCENE-2955 URL: https://issues.apache.org/jira/browse/LUCENE-2955 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.3 Attachments: LUCENE-2955.patch, LUCENE-2955.patch I created a simple class, NRTManager, that tries to abstract away some of the reopen logic when using NRT readers. You give it your IW, tell it min and max nanoseconds staleness you can tolerate, and it privately runs a reopen thread to periodically reopen the searcher. It subsumes the SearcherManager from LIA2. Besides running the reopen thread, it also adds the notion of a generation containing changes you've made. So eg it has addDocument, returning a long. You can then take that long value and pass it back to the getSearcher method and getSearcher will return a searcher that reflects the changes made in that generation. This gives your app the freedom to force immediate consistency (ie wait for the reopen) only for those searches that require it, like a verifier that adds a doc and then immediately searches for it, but also use eventual consistency for other searches. I want to also add support for the new applyDeletions option when pulling an NRT reader. Also, this is very new and I'm sure buggy -- the concurrency is either wrong over overly-locking. But it's a start... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] index version compatibility (1.9 to 2.9.2)?
I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob
RE: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
Lucene.Net 2.9.2 should be able to read the index created with 1.9 without any problem. Can you try to search with luke (http://www.getopt.org/luke/luke-0.9.9/lukeall-0.9.9.jar ) and iterate over the results? DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob=
[jira] [Resolved] (LUCENE-3152) MockDirectoryWrapper should wrap the lockfactory
[ https://issues.apache.org/jira/browse/LUCENE-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3152. - Resolution: Fixed MockDirectoryWrapper should wrap the lockfactory Key: LUCENE-3152 URL: https://issues.apache.org/jira/browse/LUCENE-3152 Project: Lucene - Java Issue Type: Bug Components: general/test Reporter: Robert Muir Fix For: 3.3, 4.0 Attachments: LUCENE-3152.patch After applying the patch from LUCENE-3147, I added a line to make the test fail if it cannot remove its temporary directory. I ran 'ant test' on linux 50 times, and it passed all 50 times. But on windows, it failed often because of write.lock... this is because of unclosed writers in the test. MockDirectoryWrapper is currently unaware of this write.lock, I think it should wrap the lockfactory so that .close() will fail if there are any outstanding locks. Then hopefully these tests would fail on linux too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3152) MockDirectoryWrapper should wrap the lockfactory
[ https://issues.apache.org/jira/browse/LUCENE-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046652#comment-13046652 ] Robert Muir commented on LUCENE-3152: - oops i meant to close this MockDirectoryWrapper should wrap the lockfactory Key: LUCENE-3152 URL: https://issues.apache.org/jira/browse/LUCENE-3152 Project: Lucene - Java Issue Type: Bug Components: general/test Reporter: Robert Muir Fix For: 3.3, 4.0 Attachments: LUCENE-3152.patch After applying the patch from LUCENE-3147, I added a line to make the test fail if it cannot remove its temporary directory. I ran 'ant test' on linux 50 times, and it passed all 50 times. But on windows, it failed often because of write.lock... this is because of unclosed writers in the test. MockDirectoryWrapper is currently unaware of this write.lock, I think it should wrap the lockfactory so that .close() will fail if there are any outstanding locks. Then hopefully these tests would fail on linux too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3106) commongrams filter calls incrementToken() after it returns false
[ https://issues.apache.org/jira/browse/LUCENE-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3106. - Resolution: Fixed Fix Version/s: (was: 3.3) 3.2 4.0 this was fixed in LUCENE-3113 commongrams filter calls incrementToken() after it returns false Key: LUCENE-3106 URL: https://issues.apache.org/jira/browse/LUCENE-3106 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Fix For: 4.0, 3.2 Attachments: LUCENE-3106.patch, LUCENE-3106_test.patch In LUCENE-3064, we beefed up MockTokenizer with assertions, and I started cutting over some analysis tests to use MockTokenizer for better coverage. The commongrams tests fail, because they call incrementToken() after it already returns false. In general its my understanding consumers should not do this (and i know of a few tokenizers that will actually throw exceptions if you do this, just like java iterators and such). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] index version compatibility (1.9 to 2.9.2)?
I have a Lucene index created with Lucene.Nethttp://Lucene.Net/ 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Nethttp://Lucene.Net/ 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Nethttp://Lucene.Net/ 1.9 compatible with 2.9.2? If not (and I assume it is not), is there some way I can convert existing indexes? (in production we have many indexes containing about 200 million docs so I'd much rather convert existing indexes than rebuilt them). Thanks Bob
RE: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
One more point, some write operations using Lucene.Net 2.9.2 (add, delete, optimize etc.) upgrades automatically your index to 2.9.2. But if your index is somehow corrupted(eg, due to some bug in 1.9) this may result in data loss. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob=
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046674#comment-13046674 ] Martin Grotzke commented on SOLR-2583: -- Yes, you're right regarding non-sparse fields. The question for the user will be when to use true or false for sparse. It might also be the case, that files differ, in that some are big, others are small. So I'm thinking about making it adaptive: when the number of lines reach a certain percentage compared to the number of docs, the float array is used, otherwise the doc-score map is used. Perhaps it would be good to allow the user to override this, s.th. like sparse=yes/no/auto. What do you think? Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor Attachments: FileFloatSource.java.patch External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc - score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046675#comment-13046675 ] Robert Muir commented on SOLR-2583: --- a smallfloat option could help too? (1/4 the ram) Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor Attachments: FileFloatSource.java.patch External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc - score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: JCC usage failure
Hi Petrus, On Thu, 9 Jun 2011, Petrus Hyvönen wrote: Thank you for the support, Finally I manage to get it working by reserving some words, and minimizing the number of wrapped methods by just including those that I specifically need: python -m jcc --jar orekit-5.0.jar --include commons-math-2.2.jar --package java.io --package org.apache.commons.math.geometry --shared --python orekit --reserved INFINITE --reserved NO_DATA --reserved ERROR --install --build Is there a way to influence the docstrings generated (__doc__ function?), or is there any way of converting from a javadoc to docstrings of the wrapped library? :) If there is a way to get at Java docstrings from the Java reflection API, then that would be a very cool addition to JCC ! Andi..
Distributed search capability
Hi, I am wondering what happened to the distributed search capability of Lucene? Thanks! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046688#comment-13046688 ] Yonik Seeley commented on SOLR-2583: bq. Perhaps it would be good to allow the user to override this, s.th. like sparse=yes/no/auto. Sounds good! I wonder what the memory cut-off should be for auto... 10% of maxDoc() or so? bq. a smallfloat option could help too? (1/4 the ram) Yep! Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor Attachments: FileFloatSource.java.patch External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc - score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
I tried converting index using IndexWriter as follows: Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9, new Lucene.Net.Analysis.KeywordAnalyzer()); writer.SetMaxBufferedDocs(2); writer.SetMaxMergeDocs(100); writer.SetMergeFactor(2); writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); writer.Commit(); That seems to work (I get what looks like a valid index directory at least). But still when I run some tests using IndexSearcher I get the same problem (I get documents in Collect() which are larger than IndexReader.MaxDoc()). Any idea what the problem could be? BTW, this is a problem because I lookup some fields (date ranges, etc.) in some custom collectors which filter out documents, and it assumes I dont get any documents larger than maxDoc. Thanks, Bob On Jun 9, 2011, at 12:37 PM, Digy wrote: One more point, some write operations using Lucene.Net 2.9.2 (add, delete, optimize etc.) upgrades automatically your index to 2.9.2. But if your index is somehow corrupted(eg, due to some bug in 1.9) this may result in data loss. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob=
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046692#comment-13046692 ] Martin Grotzke commented on SOLR-2583: -- Great, sounds like a further optimization for both sparse and non-sparse files. Though, as we had 4GB taken by FileFloatSource objects a reduction to 1/4 would still be too much for us so for our case I prefer the map based approach - then with Smallfloat. Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor Attachments: FileFloatSource.java.patch External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc - score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2529) DIH update trouble with sql field name pk
[ https://issues.apache.org/jira/browse/SOLR-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046695#comment-13046695 ] Shawn Heisey commented on SOLR-2529: I ran into a similar problem, but it had nothing to do with the name of the field. java.lang.IllegalArgumentException: deltaQuery has no column to resolve to declared primary key pk='did' In my dih-config.xml file I have this. The idea is simply to return a guaranteed result very quickly, so that it can then execute the deltaImportQuery, which as it happens is identical to the main query for a full-import: deltaQuery=SELECT MAX(did) FROM ${dataimporter.request.dataView} The result just has a column called MAX(did), not did. The following change made it work, because it has the right field name to match the primary key in your DIH config. deltaQuery=SELECT MAX(did) AS did FROM ${dataimporter.request.dataView} Hopefully your problem is similar and can be easily solved in this way, but if not, this issue will still be here. DIH update trouble with sql field name pk --- Key: SOLR-2529 URL: https://issues.apache.org/jira/browse/SOLR-2529 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.1, 3.2 Environment: Debian Lenny, JRE 6 Reporter: Thomas Gambier Priority: Blocker We are unable to use the DIH when database columnName primary key is named pk. The reported solr error is : deltaQuery has no column to resolve to declared primary key pk='pk' We have made some investigations and found that the DIH have a mistake when it's looking for the primary key between row's columns list. private String findMatchingPkColumn(String pk, Map row) { if (row.containsKey(pk)) throw new IllegalArgumentException( String.format(deltaQuery returned a row with null for primary key %s, pk)); String resolvedPk = null; for (String columnName : row.keySet()) { if (columnName.endsWith(. + pk) || pk.endsWith(. + columnName)) { if (resolvedPk != null) throw new IllegalArgumentException( String.format( deltaQuery has more than one column (%s and %s) that might resolve to declared primary key pk='%s', resolvedPk, columnName, pk)); resolvedPk = columnName; } } if (resolvedPk == null) throw new IllegalArgumentException( String.format(deltaQuery has no column to resolve to declared primary key pk='%s', pk)); LOG.info(String.format(Resolving deltaQuery column '%s' to match entity's declared pk '%s', resolvedPk, pk)); return resolvedPk; } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3177) Decouple indexer from Document/Field impls
[ https://issues.apache.org/jira/browse/LUCENE-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3177: --- Attachment: LUCENE-3177.patch New patch, removing IndexableDocument so now we only have IndexableField and IW accepts Iterable? extends IndexableField to add/updateDocument. This breaks one Lucene core test (TestDocBoost), because indexer no longer applies doc boost. I'd like to cut a new branch, and commit this starting patch there. I think (hopefully) the plan for the branch will be something like this: * Commit/iterate on this issue, which fully decouples indexer (oal.index.*) from our current Field/Fieldable/AbstractField/Document impl. This gives LUCENE-2308 more freedom to make concrete user space classes. * Commit/iterate on LUCENE-2308, which collapses the *Field hierarchy to one concrete class, and adds FieldType hierarchy. * Maybe: do LUCENE-2309 (decouple analyzers from indexer). This would mean IndexableField no longer needs isTokenized, nor the string/readerValue() methods. Indexer would just ask for the tokenStream, and the doc/field impl would go and look at its flags like NOT_ANALYZED, etc., to figure out what token stream to create. Decouple indexer from Document/Field impls -- Key: LUCENE-3177 URL: https://issues.apache.org/jira/browse/LUCENE-3177 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3177.patch, LUCENE-3177.patch I think we should define minimal iterator interfaces, IndexableDocument/Field, that indexer requires to index documents. Indexer would consume only these bare minimum interfaces, not the concrete Document/Field/FieldType classes from oal.document package. Then, the Document/Field/FieldType hierarchy is one concrete impl of these interfaces. Apps are free to make their own impls as well. Maybe eventually we make another impl that enforces a global schema, eg factored out of Solr's impl. I think this frees design pressure on our Document/Field/FieldType hierarchy, ie, these classes are free to become concrete fully-featured user-space classes with all sorts of friendly sugar APIs for adding/removing fields, getting/setting values, types, etc., but they don't need substantial extensibility/hierarchy. Ie, the extensibility point shifts to IndexableDocument/Field interface. I think this means we can collapse the three classes we now have for a Field (Fieldable/AbstracField/Field) down to a single concrete class (well, except for LUCENE-2308 where we want to break out dedicated classes for different field types...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the problem. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 8:40 PM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I tried converting index using IndexWriter as follows: Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9, new Lucene.Net.Analysis.KeywordAnalyzer()); writer.SetMaxBufferedDocs(2); writer.SetMaxMergeDocs(100); writer.SetMergeFactor(2); writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); writer.Commit(); That seems to work (I get what looks like a valid index directory at least). But still when I run some tests using IndexSearcher I get the same problem (I get documents in Collect() which are larger than IndexReader.MaxDoc()). Any idea what the problem could be? BTW, this is a problem because I lookup some fields (date ranges, etc.) in some custom collectors which filter out documents, and it assumes I dont get any documents larger than maxDoc. Thanks, Bob On Jun 9, 2011, at 12:37 PM, Digy wrote: One more point, some write operations using Lucene.Net 2.9.2 (add, delete, optimize etc.) upgrades automatically your index to 2.9.2. But if your index is somehow corrupted(eg, due to some bug in 1.9) this may result in data loss. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob=
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046707#comment-13046707 ] Yonik Seeley commented on SOLR-2564: bq. Actually, the worst case is twice as slow due to unneeded caching of a simple query. bq. Sorry, what do you mean here? The worst case with this patch as a whole (due to the caching by default). This type of query is twice as slow: {code} http://localhost:8983/solr/select?q=*:*group=truegroup.field=single1000_i {code} Which led to me wondering about how complex queries must be before the caching is a win. Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046712#comment-13046712 ] Martin Grotzke commented on SOLR-2583: -- Sounds good! I wonder what the memory cut-off should be for auto... 10% of maxDoc() or so? I'd compare both strategies to see what's the break-even, this should give an absolute number. Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor Attachments: FileFloatSource.java.patch External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc - score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046715#comment-13046715 ] Michael McCandless commented on LUCENE-2308: I created a new branch, where we can iterate on these interlinked issues: {noformat} https://svn.apache.org/repos/asf/lucene/dev/branches/fieldtype {noformat} And I committed the initial patch from LUCENE-3177, decoupling indexer from the doc/field impl. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: managing CHANGES.txt?
: you just commit it to the version it was added. : : For example, if you are adding something to 3x and trunk, commit it to : the 3x section of trunk's CHANGES.txt : then when you svn merge, there will be no merge conflict, it will just work. That assumes you know, before commiting to trunk, that it will (or wont) be backported to 3x. The approach (and the cleanness of the merges) completley breaks down if you start out assuming a feature is targetting 4x, and then later decide to backport it. it will also break down in even more fun and confusing ways if/when we have our first 4.0 release and then someone pushes for having a 3.42 feature release after that (to push out some high value features to people not yet ready to upgrade to 4.0) because the changes legitimately need to show up in both the 3.42 and 4.1 release notes. I've tried to raise these concerns several times in the past and gotten virtually no response... http://markmail.org/message/s6zq4e7aomanxulp http://search.lucidimagination.com/search/document/9a9b1327fe281305/solr_changes_3_1_4_0 I really think that the 4.0 section of CHANGES should list *every* change on the trunk prior to the 4.0 release, even if it was backported to 3.1 or 3.3 -- because fundementally the changes are not neccessarily identicle. A bug fix that has been backported may be subtley different because of the differneces between the branches. I also (still) agree with Ryan about the historic record nature of CHANGES.txt not making sense anymore now that we have multiple feature release branches going at once... Can we delete everything past line 441 in: https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/CHANGES.txt and add a comment saying to look at: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/CHANGES.txt +1 -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046766#comment-13046766 ] Michael McCandless commented on SOLR-2564: -- Ahh, I see. Could we turn off caching if the query is instanceof AllDocsQuery? Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
solr example != solr defaults
Trying to catch up on my email/jira, i notice this comment from rmuir in SOLR-2519... I think we need to stop kidding ourselves about example/default and just recognize that 99.999% of users just use the example as their default configuration. Guys, the example is the default, there is simply not argument, this is the reality! While i agree that we should recognize and expect solr users to start with the example configs and use them as their default configs under no circumstances should we get in the habit of refering to things specified in those configs the default behavior or the default settings this isn't a question of kidding ourselves, it's a question of genuinely confusing users about the differnece between behavior that exists because of what is in the example configs that they may have copied and behavior that exists because of hardcoded defaults in java code. Example #1: for backwards compatability, the default lockType used in solr when no lockType/ declaration is found is simple but the *example* lockType/ declared in the *example* configs is native. Example #2: Many request handler instances are declared/configured in the example solrconfig.xml file, but only 1 request handler instance will exist by *default* if the user removes those requestHandler/ declarations from the solrconfig.xml The point is: If you find yourself getting into the habit of refering to config values/settings in the example configs as the defaults then you *will* misslead users into thinking that you are describing the default behavior when those values/settings are absent from the configs. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046785#comment-13046785 ] Robert Muir commented on SOLR-2583: --- bq. Though, as we had 4GB taken by FileFloatSource objects a reduction to 1/4 would still be too much for us so for our case I prefer the map based approach - then with Smallfloat. If the problem is sparsity, maybe use a two-stage table, still faster than a hashmap and much better for the worst case. Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor Attachments: FileFloatSource.java.patch External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc - score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: managing CHANGES.txt?
On Thu, Jun 9, 2011 at 3:22 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : you just commit it to the version it was added. : : For example, if you are adding something to 3x and trunk, commit it to : the 3x section of trunk's CHANGES.txt : then when you svn merge, there will be no merge conflict, it will just work. That assumes you know, before commiting to trunk, that it will (or wont) be backported to 3x. The approach (and the cleanness of the merges) completley breaks down if you start out assuming a feature is targetting 4x, and then later decide to backport it. you just first move your change to the 3.x section? it will also break down in even more fun and confusing ways if/when we have our first 4.0 release and then someone pushes for having a 3.42 feature release after that (to push out some high value features to people not yet ready to upgrade to 4.0) because the changes legitimately need to show up in both the 3.42 and 4.1 release notes. we already raised this issue and decided against it for a number of reasons, it was raised on the dev list and everyone voted +1 http://www.lucidimagination.com/search/document/a42f9a22fe39c4b4/discussion_trunk_and_stable_release_strategy#67815ec25c055810 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: solr example != solr defaults
On Thu, Jun 9, 2011 at 3:47 PM, Chris Hostetter hossman_luc...@fucit.org wrote: The point is: If you find yourself getting into the habit of refering to config values/settings in the example configs as the defaults then you *will* misslead users into thinking that you are describing the default behavior when those values/settings are absent from the configs. I'm not going to really get hung on the technicalities here. We can call what happens when there is no configuration the fallback settings, if thats less confusing, but to me the example is the defaults. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
I found the problem. The problem is that I have a custom query optimizer, and that replaces certain TermQuery's within a Boolean query with a custom Query and this query has its own weight/scorer that retrieves matching documents from an in-memory cache (and that is not Lucene backed). But it looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper which assumes Collect() needs called for multiple segments - so it is adding a start offset to the doc ID that comes from my custom query implementation. I looked at the new Collector class and it seems it works the same way (assumes it needs to set the next index reader with some offset). How can I make my custom query work with the new API (so that there is basically a single segment in RAM that my query uses, but still other query clauses in same boolean query use multiple lucene segments)? I am sure that is not clear and will try to provide more detail soon. Thanks Bob On Jun 9, 2011, at 1:48 PM, Digy wrote: Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the problem. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 8:40 PM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I tried converting index using IndexWriter as follows: Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+_2.9, new Lucene.Net.Analysis.KeywordAnalyzer()); writer.SetMaxBufferedDocs(2); writer.SetMaxMergeDocs(100); writer.SetMergeFactor(2); writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); writer.Commit(); That seems to work (I get what looks like a valid index directory at least). But still when I run some tests using IndexSearcher I get the same problem (I get documents in Collect() which are larger than IndexReader.MaxDoc()). Any idea what the problem could be? BTW, this is a problem because I lookup some fields (date ranges, etc.) in some custom collectors which filter out documents, and it assumes I dont get any documents larger than maxDoc. Thanks, Bob On Jun 9, 2011, at 12:37 PM, Digy wrote: One more point, some write operations using Lucene.Net 2.9.2 (add, delete, optimize etc.) upgrades automatically your index to 2.9.2. But if your index is somehow corrupted(eg, due to some bug in 1.9) this may result in data loss. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, I get IndexOutOfRange exceptions in my collectors. It is giving me document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is returning 377831, but I get documents from Collect() with large values (for instance 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If not, is there some way I can convert it (in production we have many indexes containing about 200 million docs so I'd rather convert existing indexes than rebuilt them). Thanks Bob=
Re: managing CHANGES.txt?
On Thu, Jun 9, 2011 at 4:44 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : The approach (and the cleanness of the merges) completley breaks down if : you start out assuming a feature is targetting 4x, and then later decide : to backport it. : : you just first move your change to the 3.x section? so you're saying that to backport something from trunk to 3x you're saying the process should be: * first you should commit a change to trunk's CHANGES.txt moving the previously commited entry to the appropraite 3.x.y section * then you should merge the *two* commits to the 3x branch ? I think so? I guess in general, most things unless they are super-scary tend to get backported immediately to 3.x (and you know up-front you are going to do this) so in practice this hasn't been a problem? : we already raised this issue and decided against it for a number of : reasons, it was raised on the dev list and everyone voted +1 : : http://www.lucidimagination.com/search/document/a42f9a22fe39c4b4/discussion_trunk_and_stable_release_strategy#67815ec25c055810 i contest your characterization of everyone but clearly i missed that thread back when it happened. That only address the issue of 3.x feature releases after 4.0 comes out -- but it still doesn't address the porblem of bug fixes backported from 4.x to 3.x after 4.0 -- those will still be a serious problem if we keep removing things from the trunk CHANGES.txt when backporting. OK, well everyone that did vote, voted +1. If you disagree please respond to that thread! I think it would make things confusing if we released 4.0 say today, then released 3.3 later, and 4.0 couldnt read 3.3 indexes... but please reply to it. As far as bugfix releases, in lucene we have always had this issue (e.g. if we do 3.2.1 we have the issue now). Thats why we have in our ReleaseTODO a task where we deal with this (and i noticed it had been missing from one of the bugfix 3.0.x releases and fixed that for 3.2). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
hey jason, you are talking about the RMI contrib/remote? It was dropped a while ago since everybody rolls its own mechanism and some queries / filters didn't work with it. simon On Thu, Jun 9, 2011 at 7:29 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Hi, I am wondering what happened to the distributed search capability of Lucene? Thanks! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046845#comment-13046845 ] Martijn van Groningen commented on SOLR-2564: - {quote} But I think caching should still default to on, just limited as a pctg of the number of docs in the index. Ie, by default we will cache the result set if it's less than 20% (say) of total docs in your index. {quote} Maybe instead of specifying a maximum size for the second pass cache, we could specify it with a percentage (0 till 100) relative from maxdoc. In this case when the index grows in number of documents the cache is still used for a lot of queries (depending on the specified percentage). So if we go with this maybe group.cacheMB should be renamed to group.cache.percentage. The default can then be something like 20. Any thoughts about this? Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: managing CHANGES.txt?
: we already raised this issue and decided against it for a number of : reasons, it was raised on the dev list and everyone voted +1 : : http://www.lucidimagination.com/search/document/a42f9a22fe39c4b4/discussion_trunk_and_stable_release_strategy#67815ec25c055810 i contest your characterization of everyone but clearly i missed that thread back when it happened. That only address the issue of 3.x feature releases after 4.0 comes out -- but it still doesn't address the porblem of bug fixes backported from 4.x to 3.x after 4.0 -- those will still be a serious problem if we keep removing things from the trunk CHANGES.txt when backporting. OK, well everyone that did vote, voted +1. If you disagree please respond to that thread! I think it would make things confusing if we released 4.0 say today, then released 3.3 later, and 4.0 couldnt read 3.3 indexes... but please reply to it. The release strategy and CHANGES strategy seem different (but related) to me. I agree with the release strategy outlined in that thread, but don't see how it answers questions about maintaining CHANGES.txt The thing that seems wierd is that the historic release info in CHANGES.txt is potentially different then what will presumably be released in the 3.x branch. For example right now, if you take the 3.x lucene/CHANGES and paste them in the right place on trunk, there there are a bunch of diffs for names with accents - have been deleted. (Christian Kohlsch├╝tter via Mike McCandless) + have been deleted. (Christian Kohlschⁿtter via Mike McCandless) but also real differences like: -* LUCENE-2130: Fix performance issue when FuzzyQuery runs on a - multi-segment index (Michael McCandless) - The same exercise in solr/CHANGES.txt reveals lots of differences. Is this expected? It seem more like a by-product of trying to keep things in sync. I suppose that could be fixed with some good To simplify the process, I suggest we remove historic info from /trunk and add point people to the CHANGE in the current stable branch (3.x) -- when /trunk is moved to /branch_4x we would move everything there. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
Right, if that's not around, one needs to use multi searcher, that's gone too? On Jun 9, 2011 2:39 PM, Simon Willnauer simon.willna...@googlemail.com wrote: hey jason, you are talking about the RMI contrib/remote? It was dropped a while ago since everybody rolls its own mechanism and some queries / filters didn't work with it. simon On Thu, Jun 9, 2011 at 7:29 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Hi, I am wondering what happened to the distributed search capability of Lucene? Thanks! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2955) Add utitily class to manage NRT reopening
[ https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2955: --- Attachment: LUCENE-2955.patch OK, new patch, folding in Simon's Chris's feedback (thanks!). I pulled out the reopen thread into a separate class, so that one can now instantiate NRTManager but do their own reopening (no bg reopen thread). So eg if you want to hijack indexing threads to do reopen, you can. But if you want to simply reopen on a periodic basis with the bg thread, instantiate NRTManagerReopenThread, passing it the manager and your max and min staleness. Max staleness applies when no caller is waiting for a specific indexing change; min applies when one is. I didn't implement a ReopenStrategy... I think that should live above this class. But, I did add a WaitingListener so that such a reopener reopener can be notified when someone is waiting for a specific generation to be visible (NRTManagerReopenThread uses that). Add utitily class to manage NRT reopening - Key: LUCENE-2955 URL: https://issues.apache.org/jira/browse/LUCENE-2955 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.3 Attachments: LUCENE-2955.patch, LUCENE-2955.patch, LUCENE-2955.patch I created a simple class, NRTManager, that tries to abstract away some of the reopen logic when using NRT readers. You give it your IW, tell it min and max nanoseconds staleness you can tolerate, and it privately runs a reopen thread to periodically reopen the searcher. It subsumes the SearcherManager from LIA2. Besides running the reopen thread, it also adds the notion of a generation containing changes you've made. So eg it has addDocument, returning a long. You can then take that long value and pass it back to the getSearcher method and getSearcher will return a searcher that reflects the changes made in that generation. This gives your app the freedom to force immediate consistency (ie wait for the reopen) only for those searches that require it, like a verifier that adds a doc and then immediately searches for it, but also use eventual consistency for other searches. I want to also add support for the new applyDeletions option when pulling an NRT reader. Also, this is very new and I'm sure buggy -- the concurrency is either wrong over overly-locking. But it's a start... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
On 6/10/11 12:10 AM, Jason Rutherglen wrote: Right, if that's not around, one needs to use multi searcher, that's gone too? Yes, and rightfully so - it didn't handle properly some query types, so you would actually get wrong results. For now the answer is use Solr if you are less advanced, or roll your own (and contribute it back!) if you are more advanced ;) -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046911#comment-13046911 ] Michael McCandless commented on SOLR-2564: -- +1 Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-3.x - Build # 400 - Failure
Build: https://builds.apache.org/job/Lucene-3.x/400/ No tests ran. Build Log (for compile errors): [...truncated 9153 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2582) UIMAUpdateRequestProcessor error handling with small texts
[ https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046927#comment-13046927 ] Koji Sekiguchi commented on SOLR-2582: -- bq. it's possible to get the uniquekey from the SolrCore passed in the initialize() method Yep, we got solrCore. It was a blind side. I don't know why I passed over it! bq. I think they're related but the approach proposed here is slightly different since considers the uniquekey instead of the text analyzed as the alternative to the logField. Maybe the best solution is applying the patch in SOLR-2579 and then make the error message more useful with other debugging informations. Will do. UIMAUpdateRequestProcessor error handling with small texts -- Key: SOLR-2582 URL: https://issues.apache.org/jira/browse/SOLR-2582 Project: Solr Issue Type: Bug Affects Versions: 3.2 Reporter: Tommaso Teofili Fix For: 3.3 In UIMAUpdateRequestProcessor the catch block in processAdd() method can have a StringIndexOutOfBoundsException while composing the error message if the logging field is not set and the text being processed is shorter than 100 chars (...append(text.substring(0, 100))...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
Yes, and rightfully so - it didn't handle properly some query types, so you would actually get wrong results. That's bad! roll your own (and contribute it back!) if you are more advanced ;) Wouldn't roll your own basically mean resurrecting the previous implementation of MultiSearcher? Ie, what would be different? On Thu, Jun 9, 2011 at 4:07 PM, Andrzej Bialecki a...@getopt.org wrote: On 6/10/11 12:10 AM, Jason Rutherglen wrote: Right, if that's not around, one needs to use multi searcher, that's gone too? Yes, and rightfully so - it didn't handle properly some query types, so you would actually get wrong results. For now the answer is use Solr if you are less advanced, or roll your own (and contribute it back!) if you are more advanced ;) -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org