RE: Proposal Status, Initial Committors List, Contributors List
I'll send my CLA tomorrow. How can I check my CLA status? There was long holidays in Russia (11 days) and I completely missed the whole process. Sorry. Where can I register and fill my info as Lucene.NET developer. -- --Regards, Sergey Mirvoda
[jira] Commented: (LUCENE-2547) minimize autoboxing in NumericField
[ https://issues.apache.org/jira/browse/LUCENE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982519#action_12982519 ] Shai Erera commented on LUCENE-2547: bq. However, in my application, I already have Long/Integer and i have to unbox/rebox pointlessly. I didn't dive deep into the details of this issue, but what will someone who has only long/int (and not their counter objects) do? Will he need to create a Long/Integer out of them? I think more often than not, people use primitives and not their counter objects. I wouldn't want Lucene API to suddenly require me to allocate a Long (whether explicitly or by autoboxing) just because someone wanted to avoid using primitives ... My experience with the NumericFields API was very good so far - I didn't find it cumbersome or less performing. IMO we should handle primitives at the API level wherever we can and use Objects as less as possible. minimize autoboxing in NumericField --- Key: LUCENE-2547 URL: https://issues.apache.org/jira/browse/LUCENE-2547 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0.2 Reporter: Woody Anderson Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2547.patch dicIf you already have a Integer/Long/Double etc. numericField.setLongValue(long) causes an unnecessary auto-unbox. actually, since internal to setLongValue there is: {code} fieldsData = Long.valueOf(value); {code} then, there is an explicit box anyway, so this makes setLongValue(Long) with an auto-box of long roughly the same as setLongValue(long), but better if you started with a Long. Long being replaceable with Integer, Float, Double etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-849) Add bwlimit support to snappuller
[ https://issues.apache.org/jira/browse/SOLR-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982526#action_12982526 ] Koji Sekiguchi commented on SOLR-849: - I didn't realize this ticket. This seems to be added to rsyncd-start in SOLR-2099. Add bwlimit support to snappuller - Key: SOLR-849 URL: https://issues.apache.org/jira/browse/SOLR-849 Project: Solr Issue Type: Improvement Components: replication (scripts) Reporter: Otis Gospodnetic Priority: Minor Attachments: SOLR-849.patch From http://markmail.org/message/njnbh5gbb2mvfe24 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982529#action_12982529 ] Shai Erera commented on LUCENE-2295: I think the changes to 3x are less complicated than they seem - we don't need to deprecate anything, more than we already did. IndexWriterConfig is introduced in 3.1 and all IW ctors are already deprecated. So we can just remove the get/setMaxFieldLength from IWC and be done with it + some jdocs. Is that the intention behind the reopening of the issue? Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter --- Key: LUCENE-2295 URL: https://issues.apache.org/jira/browse/LUCENE-2295 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Shai Erera Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch A spinoff from LUCENE-2294. Instead of asking the user to specify on IndexWriter his requested MFL limit, we can get rid of this setting entirely by providing an Analyzer which will wrap any other Analyzer and its TokenStream with a TokenFilter that keeps track of the number of tokens produced and stop when the limit has reached. This will remove any count tracking in IW's indexing, which is done even if I specified UNLIMITED for MFL. Let's try to do it for 3.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2755) Some improvements to CMS
[ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2755: --- Attachment: LUCENE-2755.patch Patch includes some formatting changes and documentation addition. I'm not sure if eventually we will be able to refactor the whole MP-MS-IW interaction like we said. Earwin, if you still want to work on it, the I can keep the issue open and mark it 3.2 (unless you want to give it a try in 3.1). And I think those tiny mods/formatting are worth checking in, because they at least add some documentation to CMS. Some improvements to CMS Key: LUCENE-2755 URL: https://issues.apache.org/jira/browse/LUCENE-2755 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2755.patch While running optimize on a large index, I've noticed several things that got me to read CMS code more carefully, and find these issues: * CMS may hold onto a merge if maxMergeCount is hit. That results in the MergeThreads taking merges from the IndexWriter until they are exhausted, and only then that blocked merge will run. I think it's unnecessary that that merge will be blocked. * CMS sorts merges by segments size, doc-based and not bytes-based. Since the default MP is LogByteSizeMP, and I hardly believe people care about doc-based size segments anymore, I think we should switch the default impl. There are two ways to make it extensible, if we want: ** Have an overridable member/method in CMS that you can extend and override - easy. ** Have OneMerge be comparable and let the MP determine the order (e.g. by bytes, docs, calibrate deletes etc.). Better, but will need to tap into several places in the code, so more risky and complicated. On the go, I'd like to add some documentation to CMS - it's not very easy to read and follow. I'll work on a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2755) Some improvements to CMS
[ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982564#action_12982564 ] Earwin Burrfoot commented on LUCENE-2755: - bq. if you still want to work on it, the I can keep the issue open and mark it 3.2 (unless you want to give it a try in 3.1). I'll start another later, so please, go on. Some improvements to CMS Key: LUCENE-2755 URL: https://issues.apache.org/jira/browse/LUCENE-2755 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2755.patch While running optimize on a large index, I've noticed several things that got me to read CMS code more carefully, and find these issues: * CMS may hold onto a merge if maxMergeCount is hit. That results in the MergeThreads taking merges from the IndexWriter until they are exhausted, and only then that blocked merge will run. I think it's unnecessary that that merge will be blocked. * CMS sorts merges by segments size, doc-based and not bytes-based. Since the default MP is LogByteSizeMP, and I hardly believe people care about doc-based size segments anymore, I think we should switch the default impl. There are two ways to make it extensible, if we want: ** Have an overridable member/method in CMS that you can extend and override - easy. ** Have OneMerge be comparable and let the MP determine the order (e.g. by bytes, docs, calibrate deletes etc.). Better, but will need to tap into several places in the code, so more risky and complicated. On the go, I'd like to add some documentation to CMS - it's not very easy to read and follow. I'll work on a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Odd test failure, looking for pointers.
On Sun, Jan 16, 2011 at 10:09 PM, Erick Erickson erickerick...@gmail.com wrote: I'm working on a patch for SOLR-445, and it's near completion. The problem is that I'm getting weird test failures. TestDistributedSearch fails *only* when run as part of the full ant test, *not* when I run it either from the command line (-Dtestcase=) or from within IntelliJ. So I assume it's some interesting interaction between some previous test and the one in question. Before I go and try to figure it out, does anyone have any wisdom to offer as to 1 how to go about tracking it down? when the test fails, you should see something like this (assuming TestG): [junit] NOTE: all tests run in this JVM: [junit] [TestA, TestB, TestC, TestD, TestE, TestF, TestG] So then hack your build.xml file, remove the junit definition for testpackage and replace it with batchtest fork=yes todir=${junit.output.dir} if=testpackage fileset dir=src/test includes=**/TestA* **/TestB* **/TestC* **/TestD* **/TestE* **/TestF* **/TestG*/ /batchtest now you can run just this group in a single thread with -Dtestpackage=1 -Dtests.threadspercpu=1 as long as the test fails, basically binary search the list to find the offending test, by editing testpackage e.g. you should be able to reduce it to D,E,F,G, then D,E,G, then D,G to find out it was D that was the problem interfering with G. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982594#action_12982594 ] Robert Muir commented on LUCENE-2295: - Hi Shai, that sounds like the right solution to me! Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter --- Key: LUCENE-2295 URL: https://issues.apache.org/jira/browse/LUCENE-2295 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Shai Erera Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2295-trunk.patch, LUCENE-2295.patch A spinoff from LUCENE-2294. Instead of asking the user to specify on IndexWriter his requested MFL limit, we can get rid of this setting entirely by providing an Analyzer which will wrap any other Analyzer and its TokenStream with a TokenFilter that keeps track of the number of tokens produced and stop when the limit has reached. This will remove any count tracking in IW's indexing, which is done even if I specified UNLIMITED for MFL. Let's try to do it for 3.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release schedule Lucene 4?
On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: But: they don't yet support updating the values (the goal is to allow this, eventually). This is just the first step. No? Hmm... I thought that was a main part of the functionality? Patches welcome ;) Seriously, how would you do it? IE, I don't like how norms handle it today -- on changing a single value we must write the full array (for all docs). Same problem w/ del docs, though since its 1 bit per doc the cost is far less. Better would be a stacked approach, where the orig full array remains and we write sparse deltas (pairs of docID + new value), and at init we load the base and apply all the diffs (in order). Merging would periodically coalesce them down again... Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs
[ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982597#action_12982597 ] Michael McCandless commented on LUCENE-2666: OK thanks. Hopefully we can catch this under infoStream's watch. Not calling prepareCommit is harmless -- IW simply calls it for you under the hood when commit() is called, if you hadn't already called prepareCommit(). The two APIs are separate in case you want to involve Lucene in a 2 phased commit w/ other resources. ArrayIndexOutOfBoundsException when iterating over TermDocs --- Key: LUCENE-2666 URL: https://issues.apache.org/jira/browse/LUCENE-2666 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 3.0.2 Reporter: Shay Banon A user got this very strange exception, and I managed to get the index that it happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I easily reproduced it using the FieldCache which does exactly that (the field in question is indexed as numeric). Here is the exception: Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470) at TestMe.main(TestMe.java:56) It happens on the following segment: _26t docCount: 914 delCount: 1 delFileName: _26t_1.del And as you can see, it smells like a corner case (it fails for document number 912, the AIOOB happens from the deleted docs). The code to recreate it is simple: FSDirectory dir = FSDirectory.open(new File(index)); IndexReader reader = IndexReader.open(dir, true); IndexReader[] subReaders = reader.getSequentialSubReaders(); for (IndexReader subReader : subReaders) { Field field = subReader.getClass().getSuperclass().getDeclaredField(si); field.setAccessible(true); SegmentInfo si = (SegmentInfo) field.get(subReader); System.out.println(-- + si); if (si.getDocStoreSegment().contains(_26t)) { // this is the probleatic one... System.out.println(problematic one...); FieldCache.DEFAULT.getLongs(subReader, __documentdate, FieldCache.NUMERIC_UTILS_LONG_PARSER); } } Here is the result of a check index on that segment: 8 of 10: name=_26t docCount=914 compound=true hasProx=true numFiles=2 size (MB)=1.641 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_26t_1.del] test: open reader.OK [1 deleted docs] test: fields..OK [32 fields] test: field norms.OK [32 fields] test: terms, freq, prox...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: stored fields...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: term vectorsERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721) at
[jira] Resolved: (LUCENE-2768) add infrastructure for longer running nightly test cases
[ https://issues.apache.org/jira/browse/LUCENE-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2768. Resolution: Fixed Fix Version/s: 3.1 I think this is fixed. add infrastructure for longer running nightly test cases Key: LUCENE-2768 URL: https://issues.apache.org/jira/browse/LUCENE-2768 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: europarl.lines.txt.gz, europarl.py, LUCENE-2768.patch, LUCENE-2768.patch, LUCENE-2768.patch, LUCENE-2768_nightly.patch, LUCENE-2768_nightly.patch I'm spinning this out of LUCENE-2762... The patch there adds initial infrastructure for tests to pull documents from a line file, and adds a longish running test case using that line file to test NRT. I'd like to see some tests run on more substantial indices based on real data... so this is just a start. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982604#action_12982604 ] Michael McCandless commented on LUCENE-2474: Ahh, I get it -- invoking the listeners (on cache evict) is dangerous to do under a global lock since they could conceivably be costly. I had switched to Set to try to prevent silliness in the event that an app adds same listener over over (w/o removing it), and also to not have O(N^2) cost when removing listeners. I mean, it is an expert API, but I still think we should attempt to be defensive against silliness? How about CHM? (There is not builtin CHS, right? And HS just wraps an HM anyway). Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch, LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2738) improve test coverage for omitNorms and omitTFAP
[ https://issues.apache.org/jira/browse/LUCENE-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982607#action_12982607 ] Robert Muir commented on LUCENE-2738: - Mike just reminded me about this one: My concern for not committing is that we would actually reduce test coverage, because most tests will create say field foobar in a loop like this: {noformat} for () { newField(foobar); } {noformat} So because removing norms/omitTFAP is infectious, i think we will end out only testing certain cases... unless we change the patch so that this random value is remembered per field name during the length of the test... i think thats the right solution (adding hashmap) improve test coverage for omitNorms and omitTFAP Key: LUCENE-2738 URL: https://issues.apache.org/jira/browse/LUCENE-2738 Project: Lucene - Java Issue Type: Test Components: Build Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-2738.patch, LUCENE-2738.patch, LUCENE-2738.patch just expands on what lucenetestcase does already... if you say Analyzed_NO_NORMS, we might set norms anyway. in the same sense, if you say Index.NO, we might index it anyway, and might set omitTFAP etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-1677. --- Resolution: Fixed I think this issue has been resolved for some time. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Fix For: 3.1, 4.0 Attachments: SOLR-1677-lucenetrunk-branch-2.patch, SOLR-1677-lucenetrunk-branch-3.patch, SOLR-1677-lucenetrunk-branch.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9, the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. This patch adds basic support for the Lucene Version property to the base factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently contains a helper map to decode the version strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version ctors in Lucene). This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2169) QueryElevationComponentTest.testInterface test failure
[ https://issues.apache.org/jira/browse/SOLR-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2169. --- Resolution: Not A Problem Marking not a problem, appears to be fixed with the solr test cleanup. QueryElevationComponentTest.testInterface test failure -- Key: SOLR-2169 URL: https://issues.apache.org/jira/browse/SOLR-2169 Project: Solr Issue Type: Bug Components: Build Affects Versions: 3.1, 4.0 Environment: Hudson Reporter: Robert Muir Fix For: 3.1, 4.0 Stacktrace: {noformat} [junit] Testsuite: org.apache.solr.handler.component.QueryElevationComponentTest [junit] Testcase: testInterface(org.apache.solr.handler.component.QueryElevationComponentTest): Caused an ERROR [junit] Exception during query [junit] java.lang.RuntimeException: Exception during query [junit] at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:343) [junit] at org.apache.solr.handler.component.QueryElevationComponentTest.testInterface(QueryElevationComponentTest.java:100) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:873) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:840) [junit] Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=//*[@numFound='0'] [junit] xml response was: ?xml version=1.0 encoding=UTF-8? [junit] response [junit] lst name=responseHeaderint name=status0/intint name=QTime3/intlst name=paramsstr name=q.alt*:*/strstr name=qt/elevate/strstr name=defTypedismax/str/lst/lstresult name=response numFound=6 start=0docstr name=ida/strarr name=str_sstra/str/arrstr name=titleipod/str/docdocstr name=idb/strarr name=str_sstrb/str/arrstr name=titleipod ipod/str/docdocstr name=idc/strarr name=str_sstrc/str/arrstr name=titleipod ipod ipod/str/docdocstr name=idx/strarr name=str_sstrx/str/arrstr name=titleboosted/str/docdocstr name=idy/strarr name=str_sstry/str/arrstr name=titleboosted boosted/str/docdocstr name=idz/strarr name=str_sstrz/str/arrstr name=titleboosted boosted boosted/str/doc/result [junit] /response [junit] [junit] request was:q.alt=*:*qt=/elevatedefType=dismax [junit] at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:336) [junit] [junit] [junit] Tests run: 3, Failures: 0, Errors: 1, Time elapsed: 0.581 sec [junit] [junit] - Standard Output --- [junit] NOTE: reproduce with: ant test -Dtestcase=QueryElevationComponentTest -Dtestmethod=testInterface -Dtests.seed=8921358208309552689:278255616409435903 [junit] NOTE: test params are: codec=MockSep, locale=fr, timezone=America/Indiana/Vevay [junit] - --- [junit] - Standard Error - [junit] 17 oct. 2010 04:10:28 org.apache.solr.SolrTestCaseJ4 assertQ [junit] GRAVE: REQUEST FAILED: xpath=//*[@numFound='0'] [junit] xml response was: ?xml version=1.0 encoding=UTF-8? [junit] response [junit] lst name=responseHeaderint name=status0/intint name=QTime3/intlst name=paramsstr name=q.alt*:*/strstr name=qt/elevate/strstr name=defTypedismax/str/lst/lstresult name=response numFound=6 start=0docstr name=ida/strarr name=str_sstra/str/arrstr name=titleipod/str/docdocstr name=idb/strarr name=str_sstrb/str/arrstr name=titleipod ipod/str/docdocstr name=idc/strarr name=str_sstrc/str/arrstr name=titleipod ipod ipod/str/docdocstr name=idx/strarr name=str_sstrx/str/arrstr name=titleboosted/str/docdocstr name=idy/strarr name=str_sstry/str/arrstr name=titleboosted boosted/str/docdocstr name=idz/strarr name=str_sstrz/str/arrstr name=titleboosted boosted boosted/str/doc/result [junit] /response [junit] [junit] request was:q.alt=*:*qt=/elevatedefType=dismax [junit] 17 oct. 2010 04:10:28 org.apache.solr.common.SolrException log [junit] GRAVE: REQUEST FAILED: q.alt=*:*qt=/elevatedefType=dismax:java.lang.RuntimeException: REQUEST FAILED: xpath=//*[@numFound='0'] [junit] xml response was: ?xml version=1.0 encoding=UTF-8? [junit] response [junit] lst name=responseHeaderint name=status0/intint name=QTime3/intlst name=paramsstr name=q.alt*:*/strstr name=qt/elevate/strstr name=defTypedismax/str/lst/lstresult name=response numFound=6 start=0docstr name=ida/strarr name=str_sstra/str/arrstr name=titleipod/str/docdocstr name=idb/strarr name=str_sstrb/str/arrstr name=titleipod ipod/str/docdocstr name=idc/strarr name=str_sstrc/str/arrstr name=titleipod ipod ipod/str/docdocstr name=idx/strarr name=str_sstrx/str/arrstr
Re: Odd test failure, looking for pointers.
Robert: Thanks, I had a general idea that was the approach, but it's great to have someone point the way in detail... Erick On Mon, Jan 17, 2011 at 5:48 AM, Robert Muir rcm...@gmail.com wrote: On Sun, Jan 16, 2011 at 10:09 PM, Erick Erickson erickerick...@gmail.com wrote: I'm working on a patch for SOLR-445, and it's near completion. The problem is that I'm getting weird test failures. TestDistributedSearch fails *only* when run as part of the full ant test, *not* when I run it either from the command line (-Dtestcase=) or from within IntelliJ. So I assume it's some interesting interaction between some previous test and the one in question. Before I go and try to figure it out, does anyone have any wisdom to offer as to 1 how to go about tracking it down? when the test fails, you should see something like this (assuming TestG): [junit] NOTE: all tests run in this JVM: [junit] [TestA, TestB, TestC, TestD, TestE, TestF, TestG] So then hack your build.xml file, remove the junit definition for testpackage and replace it with batchtest fork=yes todir=${junit.output.dir} if=testpackage fileset dir=src/test includes=**/TestA* **/TestB* **/TestC* **/TestD* **/TestE* **/TestF* **/TestG*/ /batchtest now you can run just this group in a single thread with -Dtestpackage=1 -Dtests.threadspercpu=1 as long as the test fails, basically binary search the list to find the offending test, by editing testpackage e.g. you should be able to reduce it to D,E,F,G, then D,E,G, then D,G to find out it was D that was the problem interfering with G. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Odd test failure, looking for pointers.
On Mon, Jan 17, 2011 at 7:40 AM, Erick Erickson erickerick...@gmail.com wrote: Robert: Thanks, I had a general idea that was the approach, but it's great to have someone point the way in detail... Erick another thing to consider, it might not be test meddling at all. it might just be some concurrency bug, and when running the full 'ant test' your machine is busier because of multiple JVMs going at the same time... so you can also try making your computer really busy (e.g. running the lucene tests) and at the same time running the test by itself. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release schedule Lucene 4?
This sounds like incremental field updates :). Shai On Mon, Jan 17, 2011 at 1:24 PM, Michael McCandless luc...@mikemccandless.com wrote: On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: But: they don't yet support updating the values (the goal is to allow this, eventually). This is just the first step. No? Hmm... I thought that was a main part of the functionality? Patches welcome ;) Seriously, how would you do it? IE, I don't like how norms handle it today -- on changing a single value we must write the full array (for all docs). Same problem w/ del docs, though since its 1 bit per doc the cost is far less. Better would be a stacked approach, where the orig full array remains and we write sparse deltas (pairs of docID + new value), and at init we load the base and apply all the diffs (in order). Merging would periodically coalesce them down again... Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2259) Improve analyzer/version handling in Solr
[ https://issues.apache.org/jira/browse/SOLR-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2259: -- Attachment: SOLR-2259part4.patch Here is the patch for the last part, part 4. I added a warnDeprecated() helper method to the base class, and added messages for all deprecated classes in trunk. Improve analyzer/version handling in Solr - Key: SOLR-2259 URL: https://issues.apache.org/jira/browse/SOLR-2259 Project: Solr Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: SOLR-2259.patch, SOLR-2259.patch, SOLR-2259_part3.patch, SOLR-2259part2.patch, SOLR-2259part4.patch We added Version for backwards compatibility support in Lucene. We use this to fire deprecated code to emulate old version to ensure index backwards compat. Related: we deprecate old analysis components and eventually remove them. To hook into Solr, at first it defaulted to Version 2.4 emulation everywhere, with the example having the latest. if you don't specify a version in your solrconfig, it defaults to 2.4 though. However, as of LUCENE-2781 2.4 is removed: but users with old configs that don't specify a version should not be silently upgraded to the Version 3.0 emulation... this is bad. Additionally, when users are using deprecated emulation or using deprecated factories they might not know it, and it might come as a surprise if they upgrade, especially if they arent looking at java apis or java code. I propose: # in trunk: we make the solrconfig luceneMatchVersion mandatory. This is simple: Uwe already has a method that will error out if its not present, we just use that. # in 3.x: we warn if you don't specify luceneMatchVersion in solrconfig: telling you that its going to be required in 4.0 and that you are defaulting to 2.4 emulation. For example: Warning: luceneMatchVersion is not specified in solrconfig.xml. Defaulting to 2.4 emulation. You should at some point declare and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed in 4.0. This parameter will be mandatory in 4.0. # in 3.x,trunk: we warn if you are using a deprecated matchVersion constant somewhere in general, even for a specific tokenizer, telling you that you need to at some point reindex with a current version before you can move to the next release. For example: Warning: you are using 2.4 emulation, at some point you need to bump and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed in 4.0 # in 3.x,trunk: we warn if you are using a deprecated TokenStreamFactory so that you know its going to be removed. For example: Warning: the ISOLatin1FilterFactory is deprecated and will be removed in the next release. You should migrate to ASCIIFoldingFilterFactory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level
[ https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982635#action_12982635 ] Robert Muir commented on LUCENE-2236: - bq. Is that too bad? well my concern about the deprecated methods is we get into the hairy backwards compat situation... we already had issues with this with Similarity. It might be ok to essentially fix Similarity to be the way we want for 4.0 (break it) since its an expert API anyway. This patch was just a quick stab... I definitely agree with you about the name though, i prefer Similarity. bq. should Sim be aware of for which field it was created, so that no need to pass it as parameter in its methods in case this is ever important? Well honestly I think what you are saying is really needed for the future... but I would prefer to actually delay that until a future patch :) Making an optimized TermScorer is becoming more and more complicated, see the one in the bulkpostings branch for example. Because of this, its extremely tricky to customize the scoring with good performance. I think the score caching etc in term scorer needs to be moved out of TermScorer, instead the responsibility of calculating the score should reside in Similarity, including any caching it needs to do (which is really impl dependent). Basically Similarity needs to be responsible for score(), but let TermScorer etc deal with enumerating postings etc. For example, we now have the stats totalTermFreq/totalCollectionFreq by field for a term, but you can't e.g. take these and make a Language-modelling based scorer, which you should be able to do *right now*, except for limitations in our APIs. So in a future issue I would like to propose a patch to do just this, so that TermScorer, for example is more general. Similarity would need to be able to 'setup' a query (e.g. things like IDF, building score caches for the query, whatever), and then also score an individual document. In the flexible scoring prototype this is what we did, but we went even further, where a Similarity is also responsible for 'setting up' a searcher, too. So that means, its responsible for managing norm byte[] (in that patch, you only had a byte[] norms, if you made it in your Similarity yourself). I think long term that approach is definitely really interesting, but I think we can go ahead and make scoring a lot more flexible in tiny steps like this without rewriting all of lucene in one enormous patch... and this is safer as we can benchmark performance each step of the way. Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level - Key: LUCENE-2236 URL: https://issues.apache.org/jira/browse/LUCENE-2236 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 3.0 Reporter: Paul taylor Assignee: Robert Muir Attachments: LUCENE-2236.patch Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level, to faciliate this could we pass make field name available to all score methods. Currently it is only passed to some such as lengthNorm() but not others such as tf() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release schedule Lucene 4?
Hi Mike, all -- a (sorrily slow) thanks for this response ;) From the ensuing discussion, it sounds like there's a LOT to be in v4, and not raising wrong expectation by giving dates is appreciated ;) Only thing is, are we talking any time in 2012 or 2011, just to have a coarse-grained estimate without any assumptions attached? Best gregor On 1/15/11 3:20 PM, Michael McCandless wrote: This is unfortunately hard to say! There's tons of good stuff in 4.0, so we'd really like to release sooner rather than later. But then there's also alot of work remaining, eg we have 3 feature branches in flight right now, that we need to wrap up and land on trunk: * realtime (gives us concurrent flushing during indexing) * docvalues (adds column-stride fields) * bulkpostings (gives good search speedup for intblock codecs) Plus many open Jira issues. So it's hard to predict when all of this will be done Mike On Fri, Jan 14, 2011 at 12:31 PM, Gregor Heinrichgre...@arbylon.net wrote: Dear Lucene team, I am wondering whether there is an updated Lucene release schedule for the v4.0 stream. Any earliest/latest alpha/beta/stable date? And if not yet, where to track such info? Thanks in advance from Germany gregor - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release schedule Lucene 4?
On Mon, Jan 17, 2011 at 12:24 PM, Michael McCandless luc...@mikemccandless.com wrote: On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: But: they don't yet support updating the values (the goal is to allow this, eventually). This is just the first step. No? Hmm... I thought that was a main part of the functionality? Patches welcome ;) Seriously, how would you do it? IE, I don't like how norms handle it today -- on changing a single value we must write the full array (for all docs). Same problem w/ del docs, though since its 1 bit per doc the cost is far less. For some implemenations writing the value directly would be possible though. For instance for StraightFixedBytes and maybe DerefFixedBytes (depending on how its indexed) we could do change the value without writing the entire array. Yet, this would violate the write once policy! Having this feature in Lucene and having them updateable are babystep vs. dream - jason, again Patches welcome but let us first land it on trunk. simon Better would be a stacked approach, where the orig full array remains and we write sparse deltas (pairs of docID + new value), and at init we load the base and apply all the diffs (in order). Merging would periodically coalesce them down again... Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2287) (SolrCloud) Allow users to query by multiple collections
[ https://issues.apache.org/jira/browse/SOLR-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Cowell updated SOLR-2287: -- Attachment: SOLR-2287.patch Added a test class which tests basic functionality for 3 collections but should be expanded upon. (SolrCloud) Allow users to query by multiple collections Key: SOLR-2287 URL: https://issues.apache.org/jira/browse/SOLR-2287 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Soheb Mahmood Priority: Minor Attachments: SOLR-2287.patch, SOLR-2287.patch This code fixes the todo items mentioned on the SolrCloud wiki: -optionally allow user to query by collection -optionally allow user to query by multiple collections (assume schemas are compatible) We are going to put a patch to see if anyone has any trouble with this code and/or if there is any comments on how to improve this code. Unfortunately, as of now, we don't have a test class as we are working on it. We are sorry about this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982651#action_12982651 ] Salman Akram commented on SOLR-1604: I am trying to use CommonGrams with this patch but doesn't seem to work. If I don't add {!complexphrase} it uses CommonGramsQueryFilterFactory and proper bi-grams are made but of course doesn't use this patch. If I add {!complexphrase} it simply does it the old way i.e. ignore CommonGrams. Can you please help how can I combine both these features? Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release schedule Lucene 4?
Seriously, how would you do it? Ah, for LUCENE-2312 we don't need to update existing values, we only need to make additions, ie, it's not the general use case. I got the impression that DocValues should be used instead of CSF? Does CSF replace the FieldCache usage entirely? Better would be a stacked approach, where the orig full array remains and we write sparse deltas (pairs of docID + new value) What is the lookup cost using this method? On Mon, Jan 17, 2011 at 3:24 AM, Michael McCandless luc...@mikemccandless.com wrote: On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: But: they don't yet support updating the values (the goal is to allow this, eventually). This is just the first step. No? Hmm... I thought that was a main part of the functionality? Patches welcome ;) Seriously, how would you do it? IE, I don't like how norms handle it today -- on changing a single value we must write the full array (for all docs). Same problem w/ del docs, though since its 1 bit per doc the cost is far less. Better would be a stacked approach, where the orig full array remains and we write sparse deltas (pairs of docID + new value), and at init we load the base and apply all the diffs (in order). Merging would periodically coalesce them down again... Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-1191) NullPointerException in delta import
[ https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982216#action_12982216 ] Gunnlaugur Thor Briem edited comment on SOLR-1191 at 1/17/11 10:23 AM: --- I added a patch to resolve this. It resolves deltaQuery columns against pk when they differ by prefix (and report error more helpfully when no column matches, or more than one column matches). No unit test, sorry (but there's not much deltaQuery coverage anyway). All existing unit tests pass, and this is working fine for me in production. was (Author: gthb): Patch to resolve this. It resolves deltaQuery columns against pk when they differ by prefix (and report error more helpfully when no column matches, or more than one column matches). No unit test, sorry (but there's not much deltaQuery coverage anyway). All existing unit tests pass, and this is working fine for me in production. NullPointerException in delta import Key: SOLR-1191 URL: https://issues.apache.org/jira/browse/SOLR-1191 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Environment: OS: Windows Linux. Java: 1.6 DB: MySQL SQL Server Reporter: Ali Syed Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1191.patch Seeing few of these NullPointerException during delta imports. Once this happens delta import stops working and keeps giving the same error. java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) Running delta import for a particular entity fixes the problem and delta import start working again. Here is the log just before after the exception 05/27 11:59:29 86987686 INFO btpool0-538 org.apache.solr.core.SolrCore - [localhost] webapp=/solr path=/dataimport params={command=delta-importoptimize=false} status=0 QTime=0 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DataImporter - Starting Delta Import 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: content 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: content 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: job 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987704 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 12 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: job 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Delta Import completed successfully 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for
[jira] Updated: (SOLR-1191) NullPointerException in delta import
[ https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunnlaugur Thor Briem updated SOLR-1191: Comment: was deleted (was: Neglected to mention: that patch is against branch_3x.) NullPointerException in delta import Key: SOLR-1191 URL: https://issues.apache.org/jira/browse/SOLR-1191 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Environment: OS: Windows Linux. Java: 1.6 DB: MySQL SQL Server Reporter: Ali Syed Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1191.patch Seeing few of these NullPointerException during delta imports. Once this happens delta import stops working and keeps giving the same error. java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) Running delta import for a particular entity fixes the problem and delta import start working again. Here is the log just before after the exception 05/27 11:59:29 86987686 INFO btpool0-538 org.apache.solr.core.SolrCore - [localhost] webapp=/solr path=/dataimport params={command=delta-importoptimize=false} status=0 QTime=0 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DataImporter - Starting Delta Import 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: content 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: content 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: job 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987704 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 12 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: job 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Delta Import completed successfully 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: user 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987716 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 7 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: user rows obtained : 46 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: user rows obtained : 0 05/27 11:59:29 86987873 INFO Thread-4162
[jira] Issue Comment Edited: (SOLR-1191) NullPointerException in delta import
[ https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982216#action_12982216 ] Gunnlaugur Thor Briem edited comment on SOLR-1191 at 1/17/11 10:24 AM: --- I added a patch against branch_3x to resolve this. It resolves deltaQuery columns against pk when they differ by prefix (and report error more helpfully when no column matches, or more than one column matches). No unit test, sorry (but there's not much deltaQuery coverage anyway). All existing unit tests pass, and this is working fine for me in production. was (Author: gthb): I added a patch to resolve this. It resolves deltaQuery columns against pk when they differ by prefix (and report error more helpfully when no column matches, or more than one column matches). No unit test, sorry (but there's not much deltaQuery coverage anyway). All existing unit tests pass, and this is working fine for me in production. NullPointerException in delta import Key: SOLR-1191 URL: https://issues.apache.org/jira/browse/SOLR-1191 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Environment: OS: Windows Linux. Java: 1.6 DB: MySQL SQL Server Reporter: Ali Syed Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1191.patch Seeing few of these NullPointerException during delta imports. Once this happens delta import stops working and keeps giving the same error. java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) Running delta import for a particular entity fixes the problem and delta import start working again. Here is the log just before after the exception 05/27 11:59:29 86987686 INFO btpool0-538 org.apache.solr.core.SolrCore - [localhost] webapp=/solr path=/dataimport params={command=delta-importoptimize=false} status=0 QTime=0 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DataImporter - Starting Delta Import 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: content 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: content 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: job 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987704 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 12 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: job 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Delta Import completed successfully 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running
[jira] Issue Comment Edited: (SOLR-1191) NullPointerException in delta import
[ https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982216#action_12982216 ] Gunnlaugur Thor Briem edited comment on SOLR-1191 at 1/17/11 10:25 AM: --- I added a patch against branch_3x to resolve this. It resolves deltaQuery columns against pk when they differ by prefix (and reports the error more helpfully when no column matches, or more than one column matches). No unit test, sorry (but there's not much deltaQuery coverage anyway). All existing unit tests pass, and this is working fine for me in production. was (Author: gthb): I added a patch against branch_3x to resolve this. It resolves deltaQuery columns against pk when they differ by prefix (and report error more helpfully when no column matches, or more than one column matches). No unit test, sorry (but there's not much deltaQuery coverage anyway). All existing unit tests pass, and this is working fine for me in production. NullPointerException in delta import Key: SOLR-1191 URL: https://issues.apache.org/jira/browse/SOLR-1191 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Environment: OS: Windows Linux. Java: 1.6 DB: MySQL SQL Server Reporter: Ali Syed Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1191.patch Seeing few of these NullPointerException during delta imports. Once this happens delta import stops working and keeps giving the same error. java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) Running delta import for a particular entity fixes the problem and delta import start working again. Here is the log just before after the exception 05/27 11:59:29 86987686 INFO btpool0-538 org.apache.solr.core.SolrCore - [localhost] webapp=/solr path=/dataimport params={command=delta-importoptimize=false} status=0 QTime=0 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DataImporter - Starting Delta Import 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: content 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: content 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: job 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987704 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 12 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: job 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Delta Import completed successfully 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987709 INFO Thread-4162
Highlighting overlapping tokens
Hi all, I'm having an issue when highlighting fields that have overlapping tokens. There was a bug opened in Jira some year ago https://issues.apache.org/jira/browse/LUCENE-627 but I'm a bit confused about this. In jira bug's status is resolved, but still I got the exact same problem with a genuine lucene 2.9.3. Looking for what was going on, I checked org.apache.lucene.search.highlight.TokenSources that rebuilds a tokenStream from TermVectors and I found that token where not sorted by offset, as one would expect. When sorting tokens, the following comparer is used : public int compare(Object o1, Object o2) { Token t1=(Token) o1; Token t2=(Token) o2; if(t1.startOffset()t2.endOffset()) return 1; if(t1.startOffset()t2.startOffset()) return -1; return 0; } I'm not sure why endOffset is used instead of startOffset in first test (looks like a typo), and with non-overlapping token this works just fine. But with overlapping tokens longest token get pushed to the end of their overlapping zone : (big,3,6), (fish,7,11), ({big fish},3,11) would end up sorted in this exact order, where I would have expected (big,3,6) ({big fish},3,11) (fish,7,11) or ({big fish},3,11) (big,3,6) (fish,7,11). Highligthing with the term {big fish} builds a fragment by concatenating big, {big fish}, and fish, giving this phrase : bigembig fish/em fish. I tested a quick fix by having preceding comparer changed like this : public int compare(Object o1, Object o2) { Token t1 = (Token)o1; Token t2 = (Token)o2; if (t1.startOffset() t2.startOffset()) return 1; if (t1.startOffset() t2.startOffset()) return -1; if (t1.endOffset() t2.endOffset()) return -1; if (t1.endOffset() t2.endOffset()) return 1; return 0; } Highlight behavior is now correct as far as I tested it. Maybe the original sorting order has a purpose I don't understand, but to me this slight modification seams to fix everything. What should I do ? (I'm very new to this list and this community). If someone with better understanding of lucene highlight could give me some feedback, I would be grateful. Thanks for your time. Pierre - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Query parser contract changes?
Hi folks, I'm sorely puzzled by the fact that my QParser implementation ceased to work after the latest Solr/Lucene trunk update. My previous update was about ten days ago, right after Mike made his index changes. The symptom is that, although the query parser is correctly called, and seems to have the right arguments, the Query it is returning seems to be ignored. I always get zero results. I eliminated any possibility of error by just hardwiring the return of a TermQuery, and that too always yields zero results. I was able to confirm, using the standard handler with the default query parser, that the index is in fine shape. So I was wondering if the contract for QParser had changed in some subtle way that I missed? Karl
Re: Release schedule Lucene 4?
Better would be a stacked approach, where the orig full array remains and we write sparse deltas (pairs of docID + new value), and at init we load the base and apply all the diffs (in order). Merging would periodically coalesce them down again... I think this approach would be great for the DF in RT. It's better than a multidimensional array? As the lookup cost won't be too high, and we can instantiate a new main int[] every N. I'll enumerate the options we've gone over in the LUCENE-2312 issue, so we don't forget! On Mon, Jan 17, 2011 at 3:24 AM, Michael McCandless luc...@mikemccandless.com wrote: On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: But: they don't yet support updating the values (the goal is to allow this, eventually). This is just the first step. No? Hmm... I thought that was a main part of the functionality? Patches welcome ;) Seriously, how would you do it? IE, I don't like how norms handle it today -- on changing a single value we must write the full array (for all docs). Same problem w/ del docs, though since its 1 bit per doc the cost is far less. Better would be a stacked approach, where the orig full array remains and we write sparse deltas (pairs of docID + new value), and at init we load the base and apply all the diffs (in order). Merging would periodically coalesce them down again... Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2374) Add introspection API to AttributeSource/AttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2374: -- Attachment: LUCENE-2374-3x.patch Here a first patch with the proposed API (thanks Earwin). The patch is for 3.x, as it contains already the sophisticated(TM) backwards compatibility layer (see javadocs). Still missing: - Remove obsolete toString in contrib/queryparser - Test for sophisticated bw - Tests for API in general - an AttributeChecker test class that checks basic Attribute features and its implementation (copyTo, reflectAsString,...) - Solr changes to make use of this API in analysis.jsp and the other TokenStream component What do you think? Add introspection API to AttributeSource/AttributeImpl -- Key: LUCENE-2374 URL: https://issues.apache.org/jira/browse/LUCENE-2374 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2374-3x.patch AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current): - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. term-foobar,startOffset-Integer.valueOf(0),... - AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator. I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2832) on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982675#action_12982675 ] Uwe Schindler commented on LUCENE-2832: --- I would suggest to use a different default for Win64, as the adress space is not as small as with 32 bit. How about something like 4 GB or 16 GB? Also, for 32bit we use 1/8 of possible address space, so why not the same (1/8) for win64? on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory --- Key: LUCENE-2832 URL: https://issues.apache.org/jira/browse/LUCENE-2832 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2832.patch Currently the default max buffer size for MMapDirectory is 256MB on 32bit and Integer.MAX_VALUE on 64bit: {noformat} public static final int DEFAULT_MAX_BUFF = Constants.JRE_IS_64BIT ? Integer.MAX_VALUE : (256 * 1024 * 1024); {noformat} But, in windows on 64-bit, you are practically limited to 8TB. This can cause problems in extreme cases, such as: http://www.lucidimagination.com/search/document/7522ee54c46f9ca4/map_failed_at_getsearcher Perhaps it would be good to change this default such that its 256MB on 32Bit *OR* windows, but leave it at Integer.MAX_VALUE on other 64-bit and 64-bit (48-bit) systems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Query parser contract changes?
Another data point: the standard query parser actually ALSO fails when you do anything other than a *:* query. When you specify a field name, it returns zero results: root@duck93:/data/solr-dym/solr-dym# curl http://localhost:8983/solr/nose/standard?q=value_0:a*; ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime7/intl st name=paramsstr name=qvalue_0:a*/str/lst/lstresult name=respons e numFound=0 start=0/ /response But: root@duck93:/data/solr-dym/solr-dym# curl http://localhost:8983/solr/nose/standard?q=*:*; ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime244/int lst name=paramsstr name=q*:*/str/lst/lstresult name=response nu mFound=59431646 start=0docstr name=latitude40.55856/strstr name=l ongitude44.37457/strstr name=referenceLANGUAGE=und|TYPE=STREET|ADDR_TOWN SHIP_NAME=Armenia|ADDR_COUNTRY_NAME=Armenia|ADDR_STREET_NAME=A329|TITLE=A329, Ar menia, Armenia/str/docdocstr name=latitude40.7703/strstr name=long itude43.838/strstr name=referenceLANGUAGE=und|TYPE=STREET|ADDR_TOWNSHIP_ NAME=Armenia|ADDR_COUNTRY_NAME=Armenia|ADDR_STREET_NAME=A330|TITLE=A330, Armenia ... The schema has not changed: !-- Level 0 non-language value field -- field name=othervalue_0 type=string_idx_normed required=false/ ...where string_idx_normed is declared in the following way: fieldType name=string_idx_normed class=solr.TextField indexed=true stored=false omitNorms=false analyzer type=index tokenizer class=solr.ICUTokenizerFactory / filter class=solr.ICUFoldingFilterFactory / /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory / filter class=solr.ICUFoldingFilterFactory / /analyzer /fieldType ... which shouldn't matter anyway because even a simple TermQuery return from my query parser method doesn't work any more. Karl From: ext karl.wri...@nokia.com [mailto:karl.wri...@nokia.com] Sent: Monday, January 17, 2011 10:30 AM To: dev@lucene.apache.org Subject: Query parser contract changes? Hi folks, I'm sorely puzzled by the fact that my QParser implementation ceased to work after the latest Solr/Lucene trunk update. My previous update was about ten days ago, right after Mike made his index changes. The symptom is that, although the query parser is correctly called, and seems to have the right arguments, the Query it is returning seems to be ignored. I always get zero results. I eliminated any possibility of error by just hardwiring the return of a TermQuery, and that too always yields zero results. I was able to confirm, using the standard handler with the default query parser, that the index is in fine shape. So I was wondering if the contract for QParser had changed in some subtle way that I missed? Karl
[jira] Commented: (LUCENE-2832) on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982682#action_12982682 ] Uwe Schindler commented on LUCENE-2832: --- Sorry my last comment was stupid, as 1/8 of 8TB is still larger as Integer.MAX_VALUE (I was thinking of Long.MAX_VALUE). I still have no idea why this fails, as 8 TB of address space should be enough for thousands of 2 GB blocks. on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory --- Key: LUCENE-2832 URL: https://issues.apache.org/jira/browse/LUCENE-2832 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2832.patch Currently the default max buffer size for MMapDirectory is 256MB on 32bit and Integer.MAX_VALUE on 64bit: {noformat} public static final int DEFAULT_MAX_BUFF = Constants.JRE_IS_64BIT ? Integer.MAX_VALUE : (256 * 1024 * 1024); {noformat} But, in windows on 64-bit, you are practically limited to 8TB. This can cause problems in extreme cases, such as: http://www.lucidimagination.com/search/document/7522ee54c46f9ca4/map_failed_at_getsearcher Perhaps it would be good to change this default such that its 256MB on 32Bit *OR* windows, but leave it at Integer.MAX_VALUE on other 64-bit and 64-bit (48-bit) systems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Let's drop Maven Artifacts !
On 1/17/2011 at 1:53 AM, Michael Busch wrote: I don't think any user needs the ability to run an ant target on Lucene's sources to produce maven artifacts I want to be able to make modifications to the Lucene source, install Maven snapshot artifacts in my local repository, then depend on those snapshots from other projects. I doubt I'm alone. Steve
[jira] Commented: (LUCENE-2832) on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982692#action_12982692 ] Robert Muir commented on LUCENE-2832: - In this case, its very extreme. the user had 1.1 billion documents on one windows server. I am not sure if this issue will even help anyone at all: will a smaller buffer really help fragmentation in these cases? The user never responded to my suggestion to change the buffer size. I think a good option here is to do nothing at all, but I'm not opposed to reducing the buffer *if* it will actually help, mainly because the MultiMMapIndexInput is sped up and it shouldn't cause as much slowdown as before. on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory --- Key: LUCENE-2832 URL: https://issues.apache.org/jira/browse/LUCENE-2832 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2832.patch Currently the default max buffer size for MMapDirectory is 256MB on 32bit and Integer.MAX_VALUE on 64bit: {noformat} public static final int DEFAULT_MAX_BUFF = Constants.JRE_IS_64BIT ? Integer.MAX_VALUE : (256 * 1024 * 1024); {noformat} But, in windows on 64-bit, you are practically limited to 8TB. This can cause problems in extreme cases, such as: http://www.lucidimagination.com/search/document/7522ee54c46f9ca4/map_failed_at_getsearcher Perhaps it would be good to change this default such that its 256MB on 32Bit *OR* windows, but leave it at Integer.MAX_VALUE on other 64-bit and 64-bit (48-bit) systems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Let's drop Maven Artifacts !
On Mon, Jan 17, 2011 at 11:06 AM, Steven A Rowe sar...@syr.edu wrote: On 1/17/2011 at 1:53 AM, Michael Busch wrote: I don't think any user needs the ability to run an ant target on Lucene's sources to produce maven artifacts I want to be able to make modifications to the Lucene source, install Maven snapshot artifacts in my local repository, then depend on those snapshots from other projects. I doubt I'm alone. And personally I would be totally fine with this, where maven is in /dev-tools, just like eclipse and idea configuration, and we can even put a whole README.txt in there that says these are tools for developers and if they start rotting they will be deleted without a second thought. but requiring special artifacts is a different story, its my understanding that in anything but a hello world maven project you need your own local repository anyway. So such a person can simply install their own artifacts with /dev-tools into their local repository... problem solved. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982695#action_12982695 ] Shay Banon commented on LUCENE-2474: Yea, I got the reasoning for Set, we can use that, CHM with PRESENT. If you want, I can attach a simple MapBackedSet that makes any Map a Set. Still, I think that using CopyOnWriteArrayList is best here. I don't think that adding and removing listeners is something that will be done often in an app. But I might be mistaken. In this case, traversal over listeners is much better on CopyOnWriteArrayList compared to CHM. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch, LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Let's drop Maven Artifacts !
You're not alone. :) But, I bet, much more people would like to skip that step and have their artifacts downloaded from central. On Mon, Jan 17, 2011 at 19:06, Steven A Rowe sar...@syr.edu wrote: On 1/17/2011 at 1:53 AM, Michael Busch wrote: I don't think any user needs the ability to run an ant target on Lucene's sources to produce maven artifacts I want to be able to make modifications to the Lucene source, install Maven snapshot artifacts in my local repository, then depend on those snapshots from other projects. I doubt I'm alone. Steve -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Let's drop Maven Artifacts !
On Mon, Jan 17, 2011 at 11:17 AM, Earwin Burrfoot ear...@gmail.com wrote: You're not alone. :) But, I bet, much more people would like to skip that step and have their artifacts downloaded from central. Maybe, but perhaps they will need to compromise and use jar files or install into their local themselves, because currently they have to use svn checkout since we are letting maven issues prevent us from releasing. I think its been too long since we had a release, I'm gonna forget maven exists and start working towards a release. I'll cross my fingers and hope that I can get 3 +1 votes. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2832) on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2832: Fix Version/s: (was: 3.1) I am removing 3.1 as I think its the safest option. We can revisit if someone is willing to test parameters on enormous indexes (200GB, 500GB, 1TB, ...) otherwise we are just guessing. on Windows 64-bit, maybe we should default to a better maxBBufSize in MMapDirectory --- Key: LUCENE-2832 URL: https://issues.apache.org/jira/browse/LUCENE-2832 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-2832.patch Currently the default max buffer size for MMapDirectory is 256MB on 32bit and Integer.MAX_VALUE on 64bit: {noformat} public static final int DEFAULT_MAX_BUFF = Constants.JRE_IS_64BIT ? Integer.MAX_VALUE : (256 * 1024 * 1024); {noformat} But, in windows on 64-bit, you are practically limited to 8TB. This can cause problems in extreme cases, such as: http://www.lucidimagination.com/search/document/7522ee54c46f9ca4/map_failed_at_getsearcher Perhaps it would be good to change this default such that its 256MB on 32Bit *OR* windows, but leave it at Integer.MAX_VALUE on other 64-bit and 64-bit (48-bit) systems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1674) improve analysis tests, cut over to new API
[ https://issues.apache.org/jira/browse/SOLR-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982740#action_12982740 ] Robert Muir commented on SOLR-1674: --- i'd still like to add posinc tests for some of these tokenstreams, but also other ones in the analyzers module too (e.g. ones from lucene contrib). i'll set 3.2 for now. improve analysis tests, cut over to new API --- Key: SOLR-1674 URL: https://issues.apache.org/jira/browse/SOLR-1674 Project: Solr Issue Type: Test Components: Schema and Analysis Reporter: Robert Muir Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-1674.patch, SOLR-1674.patch, SOLR-1674_speedup.patch This patch * converts all analysis tests to use the new tokenstream api * converts most tests to use the more stringent assertion mechanisms from lucene * adds new tests to improve coverage Most bugs found by more stringent testing have been fixed, with the exception of SynonymFilter. The problems with this filter are more serious, the previous tests were essentially a no-op. The new tests for SynonymFilter test the current behavior, but have FIXMEs with what I think the old test wanted to expect in the comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1674) improve analysis tests, cut over to new API
[ https://issues.apache.org/jira/browse/SOLR-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-1674: -- Fix Version/s: (was: 3.1) (was: 1.5) improve analysis tests, cut over to new API --- Key: SOLR-1674 URL: https://issues.apache.org/jira/browse/SOLR-1674 Project: Solr Issue Type: Test Components: Schema and Analysis Reporter: Robert Muir Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-1674.patch, SOLR-1674.patch, SOLR-1674_speedup.patch This patch * converts all analysis tests to use the new tokenstream api * converts most tests to use the more stringent assertion mechanisms from lucene * adds new tests to improve coverage Most bugs found by more stringent testing have been fixed, with the exception of SynonymFilter. The problems with this filter are more serious, the previous tests were essentially a no-op. The new tests for SynonymFilter test the current behavior, but have FIXMEs with what I think the old test wanted to expect in the comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Closed: (LUCENE-2552) If index is pre-3.0 IndexWriter does not fail on open
[ https://issues.apache.org/jira/browse/LUCENE-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-2552. - Resolution: Duplicate Lucene Fields: (was: [New]) Duplicate of LUCENE-2720 If index is pre-3.0 IndexWriter does not fail on open - Key: LUCENE-2552 URL: https://issues.apache.org/jira/browse/LUCENE-2552 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Uwe Schindler Priority: Minor Fix For: 3.1, 4.0 IndexReader.open() fails for all old pre-3.0 indexes in Lucene trunk. This is tested by TestBackwardCompatibility. On the other hand, IndexWriter's ctor does not fail on open an existing index, that contains an old segment, because it does not check preexisting segments. It only throws IndexFormatTooOldException, if you merge segments or open a getReader(). When ConcurrentMergeScheduler is used, this may happen in an foreign thread which makes it even worse. Mike and me propose: - In 3.x introduce a new segments file format when committing, that contains the oldest and newest version of the index segments (not sure which version number to take here), this file format has new version, so its easy to detect (DefaultSegmentsFileWriter/Reader) - In trunk when opening IndexWriter check the following: If segments file is in new format, check minimum version from this file, if pre-3.0 throw IFTOE; if segments file is in old format (can be 3.0 or 3.x index not yet updated), try to open FieldsReader, as 2.9 indexes only can be detected using this - older indexes should fail before and never come to that place. If this succeeds, write a new segments file in new format (maybe after commit or whatever) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2279) Add a MockDirectoryFactory (or similar) for Solr tests
[ https://issues.apache.org/jira/browse/SOLR-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2279: -- Fix Version/s: (was: 3.1) moving out.. i don't see myself fixing this test issue very quickly. Add a MockDirectoryFactory (or similar) for Solr tests -- Key: SOLR-2279 URL: https://issues.apache.org/jira/browse/SOLR-2279 Project: Solr Issue Type: Test Components: Build Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2279.patch Currently, all Lucene tests open directories with newDirectory() [and soon-to-be added newFSDirectory() which always ensures the directory returned is an FSDir subclass, see LUCENE-2804 for this]. Additionally the directory is wrapped with MockDirectoryWrapper. This has a number of advantages: * By default the directory implementation is random, but you can easily specify a specific impl e.g. -Dtests.directory=MMapDirectory. When proposing a change to one of our directory implementations, we can run all tests with it this way... it would be good for Solr tests to respect this too. * The test framework (LuceneTestCase before/afterclass) ensures that these directories are properly closed, if not, it causes the test to fail with a stacktrace of where you first opened the directory. * MockDirectoryWrapper.close() then ensures that there are no resource leaks by default, when you open a file they save the stacktrace of where you opened it from. If you try to close the directory without say, closing an IndexReader, it fails with the stacktrace of where you opened the reader from. This is helpful for tracking down resource leaks. Currently Solr warns if it cannot delete its test temporary directory, but this is better since you know exactly where the resource leak came from. This can be disabled with an optional setter which we should probably expose for some tests that have known leaks like SpellCheck. * MockDirectoryWrapper enforce consistent test behavior on any operating system, as it won't be dependent on the return value of FSDirectory.open * MockDirectoryWrapper has a number of other checks and features, such as simulating a crash, simulating disk full, emulating windows (where you can't delete open files), etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2374: -- Summary: Add reflection API to AttributeSource/AttributeImpl (was: Add introspection API to AttributeSource/AttributeImpl) Add reflection API to AttributeSource/AttributeImpl --- Key: LUCENE-2374 URL: https://issues.apache.org/jira/browse/LUCENE-2374 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2374-3x.patch AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current): - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. term-foobar,startOffset-Integer.valueOf(0),... - AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator. I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2261) layout.vm refers to old version of jquery
[ https://issues.apache.org/jira/browse/SOLR-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2261. --- Resolution: Fixed This was not downloading the file at all, instead it was getting a 404 error as Eric described. Committed revision 1060014. Thanks Eric! layout.vm refers to old version of jquery - Key: SOLR-2261 URL: https://issues.apache.org/jira/browse/SOLR-2261 Project: Solr Issue Type: Bug Components: web gui Reporter: Eric Pugh Priority: Minor Fix For: 3.1 The velocity template layout.vm that includes jquery refers to an older 1.2.3 version of jquery: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/example/solr/conf/velocity/layout.vm Checked in is a new 1.4.3 version: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/src/webapp/web/admin/ The line that says: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.2.3.min.js/script should be changed to script type=text/javascript src=#{url_for_solr}/admin/jquery-1.4.3.min.js/script -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Let's drop Maven Artifacts !
On Jan 17, 2011, at 8:06 AM, Steven A Rowe wrote: On 1/17/2011 at 1:53 AM, Michael Busch wrote: I don't think any user needs the ability to run an ant target on Lucene's sources to produce maven artifacts I want to be able to make modifications to the Lucene source, install Maven snapshot artifacts in my local repository, then depend on those snapshots from other projects. I doubt I'm alone. +1, you're not. The only way I've ever used Lucene has been via a Maven dependency, and that was the original way I found it starting way back in lucene-core-2.0.0. If Lucene wasn't in Maven, it would be a HUGE disappointment, and an impediment towards using it. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982761#action_12982761 ] Yonik Seeley commented on LUCENE-2474: -- bq. Still, I think that using CopyOnWriteArrayList is best here. Agree - I think we should optimize for good/correct behavior. I'd like even more for there to be just a single CopyOnWriteArrayList per top-level reader that is then propagated to all sub/segment readers, including new ones on a reopen. But I guess Mike indicated that was currently too hard/hairy. The static is really non-optimal though - among other problems, it requires systems with multiple readers (and wants to do different things with different readers, such as maintain separate caches) to figure out what top-level reader a segment reader is associated with. And given that we are dealing with IndexReader instances in the callbacks, and not ReaderContext objects, this seems impossible? Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch, LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-849) Add bwlimit support to snappuller
[ https://issues.apache.org/jira/browse/SOLR-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved SOLR-849. --- Resolution: Duplicate Implemented in SOLR-2099. Add bwlimit support to snappuller - Key: SOLR-849 URL: https://issues.apache.org/jira/browse/SOLR-849 Project: Solr Issue Type: Improvement Components: replication (scripts) Reporter: Otis Gospodnetic Priority: Minor Attachments: SOLR-849.patch From http://markmail.org/message/njnbh5gbb2mvfe24 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Let's drop Maven Artifacts !
On 1/17/2011 at 11:25 AM, Robert Muir wrote: On Mon, Jan 17, 2011 at 11:06 AM, Steven A Rowe sar...@syr.edu wrote: On 1/17/2011 at 1:53 AM, Michael Busch wrote: I don't think any user needs the ability to run an ant target on Lucene's sources to produce maven artifacts I want to be able to make modifications to the Lucene source, install Maven snapshot artifacts in my local repository, then depend on those snapshots from other projects. I doubt I'm alone. And personally I would be totally fine with this, where maven is in /dev-tools, just like eclipse and idea configuration, and we can even put a whole README.txt in there that says these are tools for developers and if they start rotting they will be deleted without a second thought. but requiring special artifacts is a different story I have it wrong in LUCENE-2657. It creates special artifacts intended for publishing via public Maven repositories. But for the purposes of publishing (as opposed to locally modified sources), the artifacts published through public Maven repositories should be *exactly* the same ones produced by the Ant build, with the obvious exception of the POMs. This is the model used by previous releases, and if we continue the tradition of publishing Maven artifacts (as we have since the 1.9.1 release), the model should not change. Steve
[jira] Resolved: (SOLR-2259) Improve analyzer/version handling in Solr
[ https://issues.apache.org/jira/browse/SOLR-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2259. --- Resolution: Fixed Improve analyzer/version handling in Solr - Key: SOLR-2259 URL: https://issues.apache.org/jira/browse/SOLR-2259 Project: Solr Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: SOLR-2259.patch, SOLR-2259.patch, SOLR-2259_part3.patch, SOLR-2259part2.patch, SOLR-2259part4.patch We added Version for backwards compatibility support in Lucene. We use this to fire deprecated code to emulate old version to ensure index backwards compat. Related: we deprecate old analysis components and eventually remove them. To hook into Solr, at first it defaulted to Version 2.4 emulation everywhere, with the example having the latest. if you don't specify a version in your solrconfig, it defaults to 2.4 though. However, as of LUCENE-2781 2.4 is removed: but users with old configs that don't specify a version should not be silently upgraded to the Version 3.0 emulation... this is bad. Additionally, when users are using deprecated emulation or using deprecated factories they might not know it, and it might come as a surprise if they upgrade, especially if they arent looking at java apis or java code. I propose: # in trunk: we make the solrconfig luceneMatchVersion mandatory. This is simple: Uwe already has a method that will error out if its not present, we just use that. # in 3.x: we warn if you don't specify luceneMatchVersion in solrconfig: telling you that its going to be required in 4.0 and that you are defaulting to 2.4 emulation. For example: Warning: luceneMatchVersion is not specified in solrconfig.xml. Defaulting to 2.4 emulation. You should at some point declare and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed in 4.0. This parameter will be mandatory in 4.0. # in 3.x,trunk: we warn if you are using a deprecated matchVersion constant somewhere in general, even for a specific tokenizer, telling you that you need to at some point reindex with a current version before you can move to the next release. For example: Warning: you are using 2.4 emulation, at some point you need to bump and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed in 4.0 # in 3.x,trunk: we warn if you are using a deprecated TokenStreamFactory so that you know its going to be removed. For example: Warning: the ISOLatin1FilterFactory is deprecated and will be removed in the next release. You should migrate to ASCIIFoldingFilterFactory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2269) contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt
[ https://issues.apache.org/jira/browse/SOLR-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982794#action_12982794 ] Robert Muir commented on SOLR-2269: --- I just realized I've made the same mistake (somehow, i never noticed these contribs had their own CHANGES.txt files) I'll start working on sorting out these CHANGES.txt's and synchronizing them in branch_3x/trunk to be consistent . contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt Key: SOLR-2269 URL: https://issues.apache.org/jira/browse/SOLR-2269 Project: Solr Issue Type: Task Components: contrib - Clustering, contrib - DataImportHandler, contrib - Solr Cell (Tika extraction) Affects Versions: 3.1, 4.0 Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 http://www.lucidimagination.com/search/document/b8c19488a691265c/changes_mess {quote} I realized that some entries for DIH are in solr/CHANGES.txt. These should go solr/contrib/dataimporthandler/CHANGES.txt (Some of them are my fault). I also found that solr/contrib/*/CHANGES.txt have 1.5-dev title. These should be 4.0-dev or 3.1-dev. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982797#action_12982797 ] Chris A. Mattmann commented on LUCENE-2657: --- Hey Guys, I've set this up on some other Apache projects (Nutch, Tika [NetCDF4] and SIS so far), and basically it involved: 1. moddin'g build.xml according to Sonatype's guide (see build.xml section) https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide 2. adding pom.xmls for each artifact to be published I'll throw together a patch for this and see if I can't make this process a bit easier. Thanks. Cheers, Chris Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2315) analysis.jsp highlight matches no longer works
[ https://issues.apache.org/jira/browse/SOLR-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982822#action_12982822 ] Uwe Schindler commented on SOLR-2315: - I found the bug, will fix together with analysis.jsp rewrite in LUCENE-2374 (this changes lots of internals so its easy to fix). Problem is that a non-generified List[] in printRow causes wrong contains lookup that always returns false, so matching tokens are never seen.. analysis.jsp highlight matches no longer works Key: SOLR-2315 URL: https://issues.apache.org/jira/browse/SOLR-2315 Project: Solr Issue Type: Bug Components: web gui Reporter: Hoss Man Fix For: 3.1, 4.0 As noted by Teruhiko Kurosaka on the mailing list, at some point since Solr 1.4, highlight matches stoped working on the analysis.jsp -- on both the 3x and trunk branches -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2269) contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt
[ https://issues.apache.org/jira/browse/SOLR-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2269. --- Resolution: Fixed Committed revision 1060057, 1060058 (3x) contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt Key: SOLR-2269 URL: https://issues.apache.org/jira/browse/SOLR-2269 Project: Solr Issue Type: Task Components: contrib - Clustering, contrib - DataImportHandler, contrib - Solr Cell (Tika extraction) Affects Versions: 3.1, 4.0 Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 http://www.lucidimagination.com/search/document/b8c19488a691265c/changes_mess {quote} I realized that some entries for DIH are in solr/CHANGES.txt. These should go solr/contrib/dataimporthandler/CHANGES.txt (Some of them are my fault). I also found that solr/contrib/*/CHANGES.txt have 1.5-dev title. These should be 4.0-dev or 3.1-dev. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2160) Unknown query type 'func'
[ https://issues.apache.org/jira/browse/SOLR-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2160. --- Resolution: Fixed Marking as fixed... please reopen if you think there might still be a bug, but again I haven't seen issues in a very long time Unknown query type 'func' - Key: SOLR-2160 URL: https://issues.apache.org/jira/browse/SOLR-2160 Project: Solr Issue Type: Test Components: Build Affects Versions: 3.1, 4.0 Environment: Hudson Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: SOLR-2160.patch Several test methods in TestTrie failed in hudson, with errors such as this: Caused by: org.apache.solr.common.SolrException: Unknown query type 'func' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Let's drop Maven Artifacts !
On 1/17/11 8:06 AM, Steven A Rowe wrote: On 1/17/2011 at 1:53 AM, Michael Busch wrote: I don't think any user needs the ability to run an ant target on Lucene's sources to produce maven artifacts I want to be able to make modifications to the Lucene source, install Maven snapshot artifacts in my local repository, then depend on those snapshots from other projects. I doubt I'm alone. This is something I would feel comfortable not supporting in Lucene out-of-the-box, because if someone needs to use modified sources it's not unreasonable to expect that they can also create their own pom files for the modified jars. I do think though that we should keep publishing official artifacts to a central repo. Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Let's drop Maven Artifacts !
On 1/17/2011 at 3:05 PM, Michael Busch wrote: On 1/17/11 8:06 AM, Steven A Rowe wrote: On 1/17/2011 at 1:53 AM, Michael Busch wrote: I don't think any user needs the ability to run an ant target on Lucene's sources to produce maven artifacts I want to be able to make modifications to the Lucene source, install Maven snapshot artifacts in my local repository, then depend on those snapshots from other projects. I doubt I'm alone. This is something I would feel comfortable not supporting in Lucene out-of-the-box, because if someone needs to use modified sources it's not unreasonable to expect that they can also create their own pom files for the modified jars. This makes zero sense to me - no one will ever make their own POMs, except maybe the empty shells Maven will auto-create for you when run the install:install-file goal. The key thing that LUCENE-2657 provides is POMs that can be verified correct via Maven itself - when Maven performs a build, the POMs are checked for correctness, and if the build fails, you can tell something is wrong. Anything short of that won't cut it long term. Maybe from your perspective building the project with the POMs is unnecessary, but from mine it is a *requirement*. And, happily IMHO, users get local build/install for free. Steve
[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level
[ https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982872#action_12982872 ] Doron Cohen commented on LUCENE-2236: - {quote} well my concern about the deprecated methods is we get into the hairy backwards compat situation... It might be ok to essentially fix Similarity to be the way we want for 4.0 (break it) since its an expert API anyway. This patch was just a quick stab... I definitely agree with you about the name though, i prefer Similarity. {quote} So let's keep that name (Similarity) :) {quote} Well honestly I think what you are saying is really needed for the future {quote} Ok one step at a time makes sense.. so it means that fieldName parameters remain, although the Similarity object is created per given field, well, ok, another day... {quote} Similarity would need to be able to 'setup' a query (e.g. things like IDF, building score caches for the query, whatever), and then also score an individual document. {quote} Interesting... (flexible-scoring and bulk-postings works are still unknowns to me.) So Similarity is not only per field but also per query/scorer.. and Query would have an abstract method getSimilarityProvider(fieldName) which would be implemented by each concrete query, neatly separating finding matches from scores computation, and allowing more extendable scoring. Nice. Also, perhaps what seems to be like an inflation of Similarity objects (per query per field) is one more good reason to keep the field name params for now. Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level - Key: LUCENE-2236 URL: https://issues.apache.org/jira/browse/LUCENE-2236 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 3.0 Reporter: Paul taylor Assignee: Robert Muir Attachments: LUCENE-2236.patch Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level, to faciliate this could we pass make field name available to all score methods. Currently it is only passed to some such as lengthNorm() but not others such as tf() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Let's drop Maven Artifacts !
On 1/17/11 12:27 PM, Steven A Rowe wrote: This makes zero sense to me - no one will ever make their own POMs I did :) (for a different project though). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level
[ https://issues.apache.org/jira/browse/LUCENE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982888#action_12982888 ] Robert Muir commented on LUCENE-2236: - bq. So let's keep that name (Similarity) OK, I'll fix the patch, to rename FieldSimilarity-Similarity {quote} So Similarity is not only per field but also per query/scorer.. and Query would have an abstract method getSimilarityProvider(fieldName) which would be implemented by each concrete query, neatly separating finding matches from scores computation, and allowing more extendable scoring. Nice. Also, perhaps what seems to be like an inflation of Similarity objects (per query per field) is one more good reason to keep the field name params for now. {quote} Well I'm not totally sure how we want to do it, but definitely I think we want to split Scorer's calculations and finding matches as you say, and also split Weight's calculations and resource management For example, TermWeight today has a PerReaderTermState, which contains all the information you need to calculate the setup portion without doing any real I/O (e.g. docFreq, totalTermFreq, totalCollectionFreq, ...) So maybe this is the right thing to pass to Similarity's query setup. The Weight then would just be responsible for managing termstate and creating a Scorer... I think also the Similarity needs to be fully responsible for Explanations... but most users wouldn't have to interact with this I think. Instead I think typically their base class (TFIDFSimilarity or whatever it is) would typically provide this, based on the methods and API it exposes: tf(), idf(), but this would allow us to also have other fully-fleshed out base classes like BM25Similarity, that you can extend and tune based on the parameters that make sense to it. Anyway these are just some thoughts, first I'm going to adjust the patch to keep our existing name Similarity. Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level - Key: LUCENE-2236 URL: https://issues.apache.org/jira/browse/LUCENE-2236 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 3.0 Reporter: Paul taylor Assignee: Robert Muir Attachments: LUCENE-2236.patch Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level, to faciliate this could we pass make field name available to all score methods. Currently it is only passed to some such as lengthNorm() but not others such as tf() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982900#action_12982900 ] Michael Busch commented on LUCENE-2324: --- My last commit yesterday made almost all test cases pass. The ones that test flush-by-ram are still failing. Also TestStressIndexing2 still fails. The reason has to do with how deletes are pushed into bufferedDeletes. E.g. if I call addDocument() instead of updateDocument() in TestStressIndexing.IndexerThread then the test passes. I need to look more into that problem, but otherwise it's looking good and we're pretty close! Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2324: - Attachment: LUCENE-2324.patch Very nice! Looks like we needed all kinds of IW syncs? I noticed that in addition to TestStressIndexing2, TestNRTThreads was also failing. The attached patch fixes both by adding a sync on DW for deletes (and the update doc delete term). Time to add the RAM usage? Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release schedule Lucene 4?
Yes! Mike On Mon, Jan 17, 2011 at 7:47 AM, Shai Erera ser...@gmail.com wrote: This sounds like incremental field updates :). Shai On Mon, Jan 17, 2011 at 1:24 PM, Michael McCandless luc...@mikemccandless.com wrote: On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: But: they don't yet support updating the values (the goal is to allow this, eventually). This is just the first step. No? Hmm... I thought that was a main part of the functionality? Patches welcome ;) Seriously, how would you do it? IE, I don't like how norms handle it today -- on changing a single value we must write the full array (for all docs). Same problem w/ del docs, though since its 1 bit per doc the cost is far less. Better would be a stacked approach, where the orig full array remains and we write sparse deltas (pairs of docID + new value), and at init we load the base and apply all the diffs (in order). Merging would periodically coalesce them down again... Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release schedule Lucene 4?
I am hoping that it'll be in 2011... but don't hold me to that. It's really not possible to predict! You can always use a trunk version and give feedback :) But beware that it's unsable, meaning APIs and the index format can suddenly change. Mike On Mon, Jan 17, 2011 at 8:51 AM, Gregor Heinrich gre...@arbylon.net wrote: Hi Mike, all -- a (sorrily slow) thanks for this response ;) From the ensuing discussion, it sounds like there's a LOT to be in v4, and not raising wrong expectation by giving dates is appreciated ;) Only thing is, are we talking any time in 2012 or 2011, just to have a coarse-grained estimate without any assumptions attached? Best gregor On 1/15/11 3:20 PM, Michael McCandless wrote: This is unfortunately hard to say! There's tons of good stuff in 4.0, so we'd really like to release sooner rather than later. But then there's also alot of work remaining, eg we have 3 feature branches in flight right now, that we need to wrap up and land on trunk: * realtime (gives us concurrent flushing during indexing) * docvalues (adds column-stride fields) * bulkpostings (gives good search speedup for intblock codecs) Plus many open Jira issues. So it's hard to predict when all of this will be done Mike On Fri, Jan 14, 2011 at 12:31 PM, Gregor Heinrichgre...@arbylon.net wrote: Dear Lucene team, I am wondering whether there is an updated Lucene release schedule for the v4.0 stream. Any earliest/latest alpha/beta/stable date? And if not yet, where to track such info? Thanks in advance from Germany gregor - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2374: -- Attachment: shot2.png shot1.png LUCENE-2374-3x.patch New patch with analysis.jsp fixed (also SOLR-2315): - highlighting works again - only attributes are shown that exist at the step of analysis - attribute names changed a little bit, because it uses the ones from reflection api - Attribute class name shown in mouse hover - start/endOffset now in two different lines (no chance to do it other without another special case) - payloads are no longer printed as text, because it used default platform encoding (new String(byte[]))! I also added some example screenshots! Add reflection API to AttributeSource/AttributeImpl --- Key: LUCENE-2374 URL: https://issues.apache.org/jira/browse/LUCENE-2374 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2374-3x.patch, LUCENE-2374-3x.patch, shot1.png, shot2.png AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current): - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. term-foobar,startOffset-Integer.valueOf(0),... - AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator. I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2374: -- Attachment: shot4.png shot3.png Add reflection API to AttributeSource/AttributeImpl --- Key: LUCENE-2374 URL: https://issues.apache.org/jira/browse/LUCENE-2374 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2374-3x.patch, LUCENE-2374-3x.patch, shot1.png, shot2.png, shot3.png, shot4.png AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current): - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. term-foobar,startOffset-Integer.valueOf(0),... - AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator. I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982921#action_12982921 ] Robert Muir commented on LUCENE-2374: - +1, this looks great. i think its really important to show all the attributes in analysis.jsp, e.g. KeywordAttribute. Add reflection API to AttributeSource/AttributeImpl --- Key: LUCENE-2374 URL: https://issues.apache.org/jira/browse/LUCENE-2374 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2374-3x.patch, LUCENE-2374-3x.patch, shot1.png, shot2.png, shot3.png, shot4.png AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current): - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. term-foobar,startOffset-Integer.valueOf(0),... - AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator. I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982926#action_12982926 ] Jason Rutherglen commented on LUCENE-2324: -- Looks like TestNRTThreads is still sometimes failing, if I moved the sync around then it passes and TestStressIndexing2 fails. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2547) minimize autoboxing in NumericField
[ https://issues.apache.org/jira/browse/LUCENE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982927#action_12982927 ] Simon Willnauer commented on LUCENE-2547: - bq. I didn't dive deep into the details of this issue, but what will someone who has only long/int (and not their counter objects) do? Will he need to create a Long/Integer out of them? Shai, there are primitive setters already though. bq. The parameters cannot be null, so at least a null-check is missing agreed bq. If you try this out with a profiler you see no difference at all (loop creating a field and setting lots of values) - the objects are shortliving so JRE optimizes (allocates on thread local heap). agreed! I think calling setLong(longRef.longValue()) is not a big deal and an API change / addition is not needed here. Moving out? minimize autoboxing in NumericField --- Key: LUCENE-2547 URL: https://issues.apache.org/jira/browse/LUCENE-2547 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0.2 Reporter: Woody Anderson Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2547.patch dicIf you already have a Integer/Long/Double etc. numericField.setLongValue(long) causes an unnecessary auto-unbox. actually, since internal to setLongValue there is: {code} fieldsData = Long.valueOf(value); {code} then, there is an explicit box anyway, so this makes setLongValue(Long) with an auto-box of long roughly the same as setLongValue(long), but better if you started with a Long. Long being replaceable with Integer, Float, Double etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2547) minimize autoboxing in NumericField
[ https://issues.apache.org/jira/browse/LUCENE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982928#action_12982928 ] Uwe Schindler commented on LUCENE-2547: --- That's I am talking about, don't clutter API for such useless and inconsistent stuff minimize autoboxing in NumericField --- Key: LUCENE-2547 URL: https://issues.apache.org/jira/browse/LUCENE-2547 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0.2 Reporter: Woody Anderson Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2547.patch dicIf you already have a Integer/Long/Double etc. numericField.setLongValue(long) causes an unnecessary auto-unbox. actually, since internal to setLongValue there is: {code} fieldsData = Long.valueOf(value); {code} then, there is an explicit box anyway, so this makes setLongValue(Long) with an auto-box of long roughly the same as setLongValue(long), but better if you started with a Long. Long being replaceable with Integer, Float, Double etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2547) minimize autoboxing in NumericField
[ https://issues.apache.org/jira/browse/LUCENE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2547. - Resolution: Won't Fix moving out minimize autoboxing in NumericField --- Key: LUCENE-2547 URL: https://issues.apache.org/jira/browse/LUCENE-2547 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0.2 Reporter: Woody Anderson Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2547.patch dicIf you already have a Integer/Long/Double etc. numericField.setLongValue(long) causes an unnecessary auto-unbox. actually, since internal to setLongValue there is: {code} fieldsData = Long.valueOf(value); {code} then, there is an explicit box anyway, so this makes setLongValue(Long) with an auto-box of long roughly the same as setLongValue(long), but better if you started with a Long. Long being replaceable with Integer, Float, Double etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2374) Add reflection API to AttributeSource/AttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982931#action_12982931 ] Simon Willnauer commented on LUCENE-2374: - nice work uwe!! +1 ;) Add reflection API to AttributeSource/AttributeImpl --- Key: LUCENE-2374 URL: https://issues.apache.org/jira/browse/LUCENE-2374 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2374-3x.patch, LUCENE-2374-3x.patch, shot1.png, shot2.png, shot3.png, shot4.png AttributeSource/TokenStream inspection in Solr needs to have some insight into the contents of AttributeImpls. As LUCENE-2302 has some problems with toString() [which is not structured and conflicts with CharSequence's definition for CharTermAttribute], I propose an simple API that get a default implementation in AttributeImpl (just like toString() current): - IteratorMap.EntryString,? AttributeImpl.contentsIterator() returns an iterator (for most attributes its a singleton) of a key-value pair, e.g. term-foobar,startOffset-Integer.valueOf(0),... - AttributeSource gets the same method, it just concat the iterators of each getAttributeImplsIterator() AttributeImpl No backwards problems occur, as the default toString() method will work like before (it just gets iterator and lists), but we simply remove the documentation for the format. (Char)TermAttribute gets a special impl fo toString() according to CharSequence and a corresponding iterator. I also want to remove the abstract hashCode() and equals() methods from AttributeImpl, as they are not needed and just create work for the implementor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2872) Terms dict should block-encode terms
Terms dict should block-encode terms Key: LUCENE-2872 URL: https://issues.apache.org/jira/browse/LUCENE-2872 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2872.patch With PrefixCodedTermsReader/Writer we now encode each term standalone, ie its bytes, metadata, details for postings (frq/prox file pointers), etc. But, this is costly when something wants to visit many terms but pull metadata for only few (eg respelling, certain MTQs). This is particularly costly for sep codec because it has more metadata to store, per term. So instead I think we should block-encode all terms between indexed term, so that the metadata is stored column stride instead. This makes it faster to enum just terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2872) Terms dict should block-encode terms
[ https://issues.apache.org/jira/browse/LUCENE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2872: --- Attachment: LUCENE-2872.patch Patch. I think it's basically working, but there are still a bunch of nocommits. Terms dict should block-encode terms Key: LUCENE-2872 URL: https://issues.apache.org/jira/browse/LUCENE-2872 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2872.patch With PrefixCodedTermsReader/Writer we now encode each term standalone, ie its bytes, metadata, details for postings (frq/prox file pointers), etc. But, this is costly when something wants to visit many terms but pull metadata for only few (eg respelling, certain MTQs). This is particularly costly for sep codec because it has more metadata to store, per term. So instead I think we should block-encode all terms between indexed term, so that the metadata is stored column stride instead. This makes it faster to enum just terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2654) bulk-code each chunk b/w indexed terms in the terms dict
[ https://issues.apache.org/jira/browse/LUCENE-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2654. - Resolution: Duplicate duplicate of LUCENE-2872 bulk-code each chunk b/w indexed terms in the terms dict Key: LUCENE-2654 URL: https://issues.apache.org/jira/browse/LUCENE-2654 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Michael McCandless Priority: Minor This is an idea for exploration that came up w/ Robert... In PrefixCodedTermsDict (used by the default Standard codec), we encode each term entry standalone, using vInts. We store the changed suffix (start, end, bytes), then metadata for the term like docFreq, frq start, prx start, skip start. Each of these ints is a vInt, which is relatively costly. If instead we store the N terms between indexed terms column-stride, using bulk codec like FOR/PFOR, so that the 32 docFreqs are stored as one block, 32 frq deltas as another, etc., then seek and next should be faster. Ie, we could make decode of the metadata lazy, so that a seek to a term that does not exist may be able avoid any metadata decode entirely. Sequential scanning (lots of .next in a row) would also be faster, even if it needs the metadata since bulk-decode should be faster than multiple vInt decodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-3.x - Build # 245 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/245/ All tests passed Build Log (for compile errors): [...truncated 21049 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Windows test failure VelocityResponseWriter, unmodified trunk.
H, a fresh, unmodified checkout of Solr will fail on my Windows7 box if I run ant -Dtestcase=VelocityResponseWriterTest test. It succeeds on my Mac. Anyone got a clue? Or should I look into it? Of course it succeeds in IntelliJ. S The error reported is: junit-sequential: [junit] Testsuite: org.apache.solr.velocity.VelocityResponseWriterTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 3.242 sec [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=VelocityResponseWriterTest -Dtestmethod=testTemplateName -Dtests.s eed=7323578340428606364:2660469109353774457 [junit] NOTE: test params are: codec=RandomCodecProvider: {}, locale=ar_MA, timezone=America/Indiana/Vevay [junit] NOTE: all tests run in this JVM: [junit] [VelocityResponseWriterTest] [junit] NOTE: Windows 7 6.1 x86/Sun Microsystems Inc. 1.6.0_21 (32-bit)/cpus=4,threads=1,free=13281704,total=1625292 8 [junit] - --- [junit] [junit] Testcase: testTemplateName took 3.126 sec [junit] Caused an ERROR [junit] org.apache.log4j.Logger.setAdditivity(Z)V [junit] java.lang.NoSuchMethodError: org.apache.log4j.Logger.setAdditivity(Z)V [junit] at org.apache.velocity.runtime.log.Log4JLogChute.initAppender(Log4JLogChute.java:126) [junit] at org.apache.velocity.runtime.log.Log4JLogChute.init(Log4JLogChute.java:85) [junit] at org.apache.velocity.runtime.log.LogManager.createLogChute(LogManager.java:157) [junit] at org.apache.velocity.runtime.log.LogManager.updateLog(LogManager.java:255) [junit] at org.apache.velocity.runtime.RuntimeInstance.initializeLog(RuntimeInstance.java:795) [junit] at org.apache.velocity.runtime.RuntimeInstance.init(RuntimeInstance.java:250) [junit] at org.apache.velocity.app.VelocityEngine.init(VelocityEngine.java:107) [junit] at org.apache.solr.response.VelocityResponseWriter.getEngine(VelocityResponseWriter.java:131) [junit] at org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:39) [junit] at org.apache.solr.velocity.VelocityResponseWriterTest.testTemplateName(VelocityResponseWriterTest.java: 22) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059) [junit] [junit] Test org.apache.solr.velocity.VelocityResponseWriterTest FAILED BUILD FAILED C:\apache-trunk-unmodified\solr\build.xml:383: The following error occurred while executing this line: C:\apache-trunk-unmodified\solr\build.xml:487: Tests failed! Erick
Lucene-trunk - Build # 1429 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1429/ All tests passed Build Log (for compile errors): [...truncated 16590 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982997#action_12982997 ] Lance Norskog commented on SOLR-445: bq. 2 From the original post, rolling this back will be tricky. Very tricky. The autocommit feature makes it indeterminate what's been committed to the index, so I don't know how to even approach rolling back everything. Don't allow autocommits during an update. Simple. Or, rather, all update requests block at the beginning during an autocommit. If an update request has too many documents, don't do so many documents in an update. XmlUpdateRequestHandler bad documents mid batch aborts rest of batch Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.3 Reporter: Will Johnson Assignee: Erick Erickson Fix For: Next Attachments: SOLR-445.patch, solr-445.xml Has anyone run into the problem of handling bad documents / failures mid batch. Ie: add doc field name=id1/field /doc doc field name=id2/field field name=myDateFieldI_AM_A_BAD_DATE/field /doc doc field name=id3/field /doc /add Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Windows test failure VelocityResponseWriter, unmodified trunk.
On Mon, Jan 17, 2011 at 10:42 PM, Erick Erickson erickerick...@gmail.com wrote: H, a fresh, unmodified checkout of Solr will fail on my Windows7 box if I run ant -Dtestcase=VelocityResponseWriterTest test. It succeeds on my Mac. Anyone got a clue? Or should I look into it? Of course it succeeds in IntelliJ. S My windows laptop took a vacation (a permanent one) so I can't verify. But when I see NoSuchMethod runtime exceptions, I usually try a fresh checkout first. It's sometimes just stuff not getting cleaned up properly. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2316) SynonymFilterFactory should ensure synonyms argument is provided.
[ https://issues.apache.org/jira/browse/SOLR-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983018#action_12983018 ] Yonik Seeley commented on SOLR-2316: Does this affect trunk also, or just the 3.x branch? SynonymFilterFactory should ensure synonyms argument is provided. - Key: SOLR-2316 URL: https://issues.apache.org/jira/browse/SOLR-2316 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: David Smiley Priority: Minor Fix For: 3.1 Attachments: 2316.patch If for some reason the synonyms attribute is not present on the filter factory configuration, a latent NPE will eventually show up during indexing/searching. Instead a helpful error should be thrown at initialization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2324: - Attachment: LUCENE-2324.patch Ok, TestNRTThreads works after 10+ iterations. TestStressIndexing2 works most of the time however with enough iterations, eg, ant test-core -Dtestcase=TestStressIndexing2 -Dtests.iter=30 it fails. I think that deletes are sneaking in because we're not sync'ed on DW as we're flushing the DWPT. Ideally some assertions would pick this up. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org