[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979502#action_12979502 ] JohnWu commented on SOLR-1395: -- TomLiu: in slave node the katta.node.properties also set as follows? #node.server.class=net.sf.katta.lib.lucene.LuceneServer node.server.class=org.apache.solr.katta.DeployableSolrKattaServer Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Plugin Idea: Index configuration at runtime?
Hi all, I'd like to contribute to Apache Solr with a small plugin, but I'd like to have your opinion first. In my project, the user wants to configure the indexs on web app side, not in solrconfig.xml. For example, the user can select an Ignore case option for the index myindex. This way, at indexing or querying time, a Lowercase filter must be applied. My idea was to developp a SolrPlugin that will use configurations keys sent at runtime with the string to index or query. Based on the configurations keys, the plugin will apply the good tokenizer/filters: {lowercase=true,stopwords={this,the,to,is}}This is the string to index! At index or query time, the solrplugin will apply the filters, the string becomes string index What do you think about such a plugin? Thanks, Tanguy
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979522#action_12979522 ] Earwin Burrfoot commented on LUCENE-2312: - Some questions to align myself with impending reality. Is that right that future RT readers are no longer immutable snapshots (in a sense that they have variable maxDoc)? If it is so, are you keeping current NRT mode, with fast turnaround, yet immutable readers? Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene-trunk - Build # 1421 - Failure
OK I snatched this index and indeed I can reproduce the OOME during CheckIndex! The hunt begins :) Mike On Sun, Jan 9, 2011 at 10:35 PM, Robert Muir rcm...@gmail.com wrote: On Sun, Jan 9, 2011 at 9:40 PM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1421/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed maybe this is specific to pulsing? I noticed its failed 3 times with this identical pulsing stacktrace: Lucene-trunk/1421, tests-only/3590, tests-only/3570 However, this time it failed in a nightly build (perhaps the indexes are still available on the hudson machine if we salvage before the next nightly build?) it should be under lucene/build/test/N/jrecrashXXtmp/ all 3 times the stacktrace is: test: terms, freq, prox...ERROR [Java heap space] java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Position.clone(PulsingPostingsWriterImpl.java:104) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Document.clone(PulsingPostingsWriterImpl.java:74) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingTermState.clone(PulsingPostingsReaderImpl.java:72) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingDocsEnum.reset(PulsingPostingsReaderImpl.java:234) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl.docs(PulsingPostingsReaderImpl.java:189) at org.apache.lucene.index.codecs.PrefixCodedTermsReader$FieldReader$SegmentTermsEnum.docs(PrefixCodedTermsReader.java:515) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:756) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:489) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:83) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2296) Upgrade Carrot2 binaries to version 3.4.2
[ https://issues.apache.org/jira/browse/SOLR-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-2296. -- Resolution: Fixed Thanks, I committed 1.5 jar file. trunk: Committed revision 1057149. 3x: Committed revision 1057150. Unfortunately, I think 3.4.2 jar doesn't solve SOLR-2282 test issue (see above error on trunk) but I'm marking this issue as resolved. I think I look at the test error in SOLR-2282. Upgrade Carrot2 binaries to version 3.4.2 - Key: SOLR-2296 URL: https://issues.apache.org/jira/browse/SOLR-2296 Project: Solr Issue Type: Task Components: contrib - Clustering Reporter: Stanislaw Osinski Assignee: Koji Sekiguchi Fix For: 3.1, 4.0 Attachments: carrot2-core-3.4.2-jdk1.5.jar, carrot2-core-3.4.2.jar, SOLR-2296-branch_3.1.patch, SOLR-2296-trunk.patch Version 3.4.2 fixes a concurrency bug in Carrot2 that may be causing SOLR-2282. I'll attach patches in a minute. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Intermitted Failure on TestFST
I too cannot repro with this seed (separately: not good!). I started up a while(1) on beast but so far no failure... h. Mike On Mon, Jan 10, 2011 at 5:37 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I just ran into this while working on LUCENE-2694. I can not reproduce this so far but I don't think this is related to my changes though. [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs [junit] Testcase: testRealTerms(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED [junit] expected:59507 but was:16982 [junit] junit.framework.AssertionFailedError: expected:59507 but was:16982 [junit] at org.apache.lucene.util.automaton.fst.TestFSTs.assertSame(TestFSTs.java:1058) [junit] at org.apache.lucene.util.automaton.fst.TestFSTs.testRealTerms(TestFSTs.java:1018) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049) [junit] [junit] [junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 13.364 sec [junit] [junit] - Standard Error - [junit] NOTE: Ignoring nightly-only test method 'testBigSet' [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs -Dtestmethod=testRealTerms -Dtests.seed=6383451564727439626:0 [junit] NOTE: test params are: codec=RandomCodecProvider: {id=MockSep, body=MockRandom, title=MockRandom, titleTokenized=Standard, date=MockSep}, locale=ar, timezone=America/Argentina/Tucuman [junit] NOTE: all tests run in this JVM: [junit] [TestToken, TestAddIndexes, TestCrash, TestFlex, TestIndexWriter, TestIndexWriterOnJRECrash, TestMultiReader, TestOmitTf, TestPersistentSnapshotDeletionPolicy, TestSnapshotDeletionPolicy, TestTransactionRollback, TestBooleanScorer, TestDateSort, TestFieldCacheTermsFilter, TestMultiTermQueryRewrites, TestPositionIncrement, TestRegexpQuery, TestSimpleExplanations, TestTermScorer, TestDocValues, TestFieldMaskingSpanQuery, TestSpansAdvanced, TestBufferedIndexInput, TestRAMDirectory, TestArrayUtil, TestFieldCacheSanityChecker, TestSmallFloat, TestFSTs] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene-trunk - Build # 1421 - Failure
OK, so this looks be caused by 1) the fact that we are indexing Greek stop words with the TestNRTThreads test, and 2) Pulsing codec is horribly inefficient in how it handles pulsed terms that have many many positions. But it's odd we've only hit this failure in the JRE crash test... I've added a CheckIndex to TestNRTThreads itself and I'll see if that too can provoke an OOME. It should. Mike On Mon, Jan 10, 2011 at 5:14 AM, Michael McCandless luc...@mikemccandless.com wrote: OK I snatched this index and indeed I can reproduce the OOME during CheckIndex! The hunt begins :) Mike On Sun, Jan 9, 2011 at 10:35 PM, Robert Muir rcm...@gmail.com wrote: On Sun, Jan 9, 2011 at 9:40 PM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1421/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed maybe this is specific to pulsing? I noticed its failed 3 times with this identical pulsing stacktrace: Lucene-trunk/1421, tests-only/3590, tests-only/3570 However, this time it failed in a nightly build (perhaps the indexes are still available on the hudson machine if we salvage before the next nightly build?) it should be under lucene/build/test/N/jrecrashXXtmp/ all 3 times the stacktrace is: test: terms, freq, prox...ERROR [Java heap space] java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Position.clone(PulsingPostingsWriterImpl.java:104) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Document.clone(PulsingPostingsWriterImpl.java:74) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingTermState.clone(PulsingPostingsReaderImpl.java:72) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingDocsEnum.reset(PulsingPostingsReaderImpl.java:234) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl.docs(PulsingPostingsReaderImpl.java:189) at org.apache.lucene.index.codecs.PrefixCodedTermsReader$FieldReader$SegmentTermsEnum.docs(PrefixCodedTermsReader.java:515) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:756) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:489) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:83) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2694: Attachment: LUCENE-2694.patch Another iteration on this after LUCENE-2831 was committed last week. - updated to trunk all test pass - re-added all ord() related stuff back to TermsEnum since I think we should decouple this and solve it in a different issue. There is already enough changes in here and discussions should be focused on making MTQ single pass. - Changed IndexSearcher to run concurrent searches on a leaf slice rather than on a leaf converted to a Top-Level Context. That made the callables a bit simpler and is more consistent since the hierarchy is preserved. - TermState is now referenced by leaf ordinal and asserted using the leaf's top-level ctx. - TermQuery is not single pass for all queries while state is only hold in Weight unless PerReaderTermState as not set. But even then the top-level ctx must be identical to the given IS's top-level ctx otherwise the give PerReaderTermState is not used. this one seems pretty close MTQ rewrite + weight/scorer init should be single pass -- Key: LUCENE-2694 URL: https://issues.apache.org/jira/browse/LUCENE-2694 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch Spinoff of LUCENE-2690 (see the hacked patch on that issue)... Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Consolidating FunctionQuery
+1 for having a single function query - actually this is what LUCENE-1081 and SOLR-192 is about. I'd look at this after LUCENE-1812, but this is waiting so long now, please go ahead with this, I'll follow/join. -0 for moving function to modules - I think this is used as core capability by many applications/users and I don't see why it should be in modules, as to me this is more like .. I was going to say like payloads but this is not true, it is not nearly as involved and as internal as payloads, so replaced -1 with -0 here, but, could you explain why move this from core to modules? Doron On Mon, Jan 10, 2011 at 1:03 PM, Chris Male gento...@gmail.com wrote: +1 to this idea. I recall talking to Robert and Mark about it as a good first step as part of the spatial code consolidation as well. On Mon, Jan 10, 2011 at 11:59 PM, Simon Willnauer simon.willna...@googlemail.com wrote: hey, today I came across function query in lucene and that reminded me that Solr is already using its own derived version which is no good IMO. We should try to consolidate the two version and make solr use the consolidated version which would even be good for lucene users. It seems it would make lots of sense to make the entire function query stuff a module and drop it from core in 4.0. I didn't look too close into the solr version but it seems to be not that hard to luceneify it and move it to modules, thoughts? simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Software Developer | JTeam BV.| www.jteam.nl
[jira] Updated: (SOLR-2310) DocBuilder's getTimeElapsedSince Error
[ https://issues.apache.org/jira/browse/SOLR-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2310: - Priority: Trivial (was: Major) Affects Version/s: (was: 4.0) 1.3 1.4 1.4.1 Fix Version/s: 4.0 3.1 DocBuilder's getTimeElapsedSince Error -- Key: SOLR-2310 URL: https://issues.apache.org/jira/browse/SOLR-2310 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4, 1.4.1 Environment: JDK1.6 Reporter: tom liu Assignee: Koji Sekiguchi Priority: Trivial Fix For: 3.1, 4.0 i has a job which runs about 65 hours, but the dataimport?command=status http requests returns 5 hours. in getTimeElapsedSince method of DocBuilder: {noformat} static String getTimeElapsedSince(long l) { l = System.currentTimeMillis() - l; return (l / (6 * 60)) % 60 + : + (l / 6) % 60 + : + (l / 1000) % 60 + . + l % 1000; } {noformat} the hours Compute is wrong, it mould be : {noformat} static String getTimeElapsedSince(long l) { l = System.currentTimeMillis() - l; return (l / (6 * 60)) + : + (l / 6) % 60 + : + (l / 1000) % 60 + . + l % 1000; } {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979553#action_12979553 ] Michael McCandless commented on LUCENE-2843: bq. If a system is forced to swap out, it'll swap your explicitly managed RAM just as likely as memory-mapped files. In fact, even if it's not under any real memory pressure the OS will swap out your not-recently-accessed RAM. Net/net this is a good policy, if your metric is total throughput accomplished by all programs. But if your metric is latency to search queries, this is an awful policy. Fortunately OSs (at least Windows Linux) give you some tunability here. Unfortunately, the tunable is global and it defaults badly for those programs that do make a careful distinction b/w what data structures are best held in RAM and what data is best left on disk. If I could I would offer an option to pin these pages, so the OS cannot swap them out, but I don't think we can do (easily) that from javaland (and I think you'd have to be root). Lacking pinning the best (approximation) we can do is pull these ourselves into RAM. Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2843.patch, LUCENE-2843.patch PrefixCodedTermsReader/Writer (used by all real core codecs) already supports pluggable terms index impls. The only impl we have now is FixedGapTermsIndexReader/Writer, which picks every Nth (default 32) term and holds it in efficient packed int/byte arrays in RAM. This is already an enormous improvement (RAM reduction, init time) over 3.x. This patch adds another impl, VariableGapTermsIndexReader/Writer, which lets you specify an arbitrary IndexTermSelector to pick which terms are indexed, and then uses an FST to hold the indexed terms. This is typically even more memory efficient than packed int/byte arrays, though, it does not support ord() so it's not quite a fair comparison. I had to relax the terms index plugin api for PrefixCodedTermsReader/Writer to not assume that the terms index impl supports ord. I also did some cleanup of the FST/FSTEnum APIs and impls, and broke out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor when the FST is used as a terms index but seekCeil when it's holding all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979554#action_12979554 ] Michael McCandless commented on LUCENE-2312: I believe the goal for RT readers is still point in time reader semantics. Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979558#action_12979558 ] Michael McCandless commented on LUCENE-2324: I think on commit if we hit an aborting exception flushing a given DWPT, we throw it then there. Any segs already flushed remain flushed (but not committed). Any segs not yet flushed remain not yet flushed... Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Consolidating FunctionQuery
On Mon, Jan 10, 2011 at 12:56 PM, Doron Cohen cdor...@gmail.com wrote: +1 for having a single function query - actually this is what LUCENE-1081 and SOLR-192 is about. I'd look at this after LUCENE-1812, but this is waiting so long now, please go ahead with this, I'll follow/join. good stuff... -0 for moving function to modules - I think this is used as core capability by many applications/users and I don't see why it should be in modules, as to me this is more like .. I was going to say like payloads but this is not true, it is not nearly as involved and as internal as payloads, so replaced -1 with -0 here, but, could you explain why move this from core to modules? So moving stuff to modules has several advantages compared to trunk. I think we all agree that we want lucene to be able to make use of this functinality right?! So if we keep it in core we can either try to make it right this time or we stick with bw compat for a long time again risking that the same thing happens again and solr changes stuff in incompatible ways. While this is a minor risk, I see FQ not really a core part of lucene but rather something that is similar to Analysis, a package with a small base interface (which could stay in lucene core) and a large useful impl base like all the funcs in solr. So it seems to me that modules is the right place for at least those implemenations. does that make sense? Doron On Mon, Jan 10, 2011 at 1:03 PM, Chris Male gento...@gmail.com wrote: +1 to this idea. I recall talking to Robert and Mark about it as a good first step as part of the spatial code consolidation as well. On Mon, Jan 10, 2011 at 11:59 PM, Simon Willnauer simon.willna...@googlemail.com wrote: hey, today I came across function query in lucene and that reminded me that Solr is already using its own derived version which is no good IMO. We should try to consolidate the two version and make solr use the consolidated version which would even be good for lucene users. It seems it would make lots of sense to make the entire function query stuff a module and drop it from core in 4.0. I didn't look too close into the solr version but it seems to be not that hard to luceneify it and move it to modules, thoughts? simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Software Developer | JTeam BV.| www.jteam.nl - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2831) Revise Weight#scorer Filter#getDocIdSet API to pass Readers context
[ https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979561#action_12979561 ] Simon Willnauer commented on LUCENE-2831: - bq. And also Collector? yeah I think that one can move to ARC too. {code} ValueSource#getValues(IndexReader) {code} is another one Revise Weight#scorer Filter#getDocIdSet API to pass Readers context - Key: LUCENE-2831 URL: https://issues.apache.org/jira/browse/LUCENE-2831 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, boolean, boolean) we should / could revise the API and pass in a struct that has parent reader, sub reader, ord of that sub. The ord mapping plus the context with its parent would make several issues way easier. See LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979562#action_12979562 ] Robert Muir commented on LUCENE-2694: - One question, I'm look at the definition of TermState: {noformat} Holds all state required for {...@link TermsEnum} to produce a {...@link DocsEnum} without re-seeking the terms dict. {noformat} So why do we have seek(BytesRef, TermState) shouldnt it just be seek(Termstate) ? I think its confusing it takes an unnecessary bytes parameter. MTQ rewrite + weight/scorer init should be single pass -- Key: LUCENE-2694 URL: https://issues.apache.org/jira/browse/LUCENE-2694 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch Spinoff of LUCENE-2690 (see the hacked patch on that issue)... Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Consolidating FunctionQuery
On Mon, Jan 10, 2011 at 2:13 PM, Simon Willnauer simon.willna...@googlemail.com wrote: So moving stuff to modules has several advantages compared to trunk. I think we all agree that we want lucene to be able to make use of this functinality right?! So if we keep it in core we can either try to make it right this time or we stick with bw compat for a long time again risking that the same thing happens again and solr changes stuff in incompatible ways. While this is a minor risk, I see FQ not really a core part of lucene but rather something that is similar to Analysis, a package with a small base interface (which could stay in lucene core) and a large useful impl base like all the funcs in solr. So it seems to me that modules is the right place for at least those implemenations. does that make sense? Do you mean that modules are not subject to backcompat guidelines? I was not ware of that. Relieving backcompat burden is a motivation by itself, but that asside, function query (or custom score query) seems like basic capability to me - it is not something that both Lucene and Solr are using, but rather something that Lucene is providing and Solr and user applications are using, right?
[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979496#action_12979496 ] Steven Rowe edited comment on LUCENE-2657 at 1/10/11 8:03 AM: -- Added profiles to populate internal repositories at {{lucene/dist/maven/}} and {{solr/dist/maven/}} with generated artifacts. To populate {{lucene/dist/maven/}} with POMs and binary, source and javadoc artifacts, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap,deploy-to-lucene-dist-maven-repository deploy cd lucene mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy cd ../modules mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy {code} To populate {{solr/dist/maven/}}, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap install cd solr mvn -DskipTests -Pdeploy-to-solr-dist-maven-repository,javadocs-jar,source-jar deploy {code} was (Author: steve_rowe): Added profiles to populate internal repositories at {{lucene/dist/maven/}} and {{solr/dist/maven/}} with generated artifacts. To populate {{lucene/dist/maven/}} with POMs and binary, source and javadoc artifacts, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap,deploy-to-lucene-dist-maven-repository deploy cd lucene mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy cd ../modules mvn -DskipTests -Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy {code} To populate {{lucene/dist/solr/}}, run the following from the top level Lucene/Solr directory: {code} mvn -N -P bootstrap install cd solr mvn -DskipTests -Pdeploy-to-solr-dist-maven-repository,javadocs-jar,source-jar deploy {code} Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2694: Attachment: LUCENE-2694_hack.patch here's a hack patch (dont think it actually works) just showing what i mean. I think termsenum should only have seek(TermState). in the hack-patch, i made the termState() and seekTermState() non-abstract: the default impl returns a 'SimpleTermState' containing the term bytes and saved docFreq and implements seek(TermState) with those bytes. This is basically what the patch had everywhere anyway as an implementation (for many of these, we should use more efficient impls, i fixed this for MemoryIndex as an example, but MultiTermsEnum comes to mind). Also, i don't understand what was going on with setting bytes on the DeltaBytesReader with your seek(BytesRef, TermState) before. If StandardCodec needs to know the shared byte[] prefix or something like that to reposition the enum, then it should put this in its termstate. MTQ rewrite + weight/scorer init should be single pass -- Key: LUCENE-2694 URL: https://issues.apache.org/jira/browse/LUCENE-2694 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694_hack.patch Spinoff of LUCENE-2690 (see the hacked patch on that issue)... Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: LICENSE/NOTICE file contents
Everyone should (carefully) read the Apache License 2.0 section 4(d). It turns out that Apache has a somewhat unusual definition for the term derivative work. It has to be something you actually modified, not just include. So the incubator approach seems correct; neither the HSQLDB notice nor the Jetty notice belong in the Solr NOTICE.txt file. For ManifoldCF, I just moved them to the end of the README.txt. The old notice text is different from the corresponding LICENSE.txt text in both cases, so it did not make sense to either eliminate them or move them to LICENSE.txt. Thanks, Karl -Original Message- From: ext Robert Muir [mailto:rcm...@gmail.com] Sent: Saturday, January 08, 2011 10:16 AM To: dev@lucene.apache.org; yo...@lucidimagination.com Subject: Re: LICENSE/NOTICE file contents On Sat, Jan 8, 2011 at 10:06 AM, Yonik Seeley yo...@lucidimagination.com wrote: There also wasn't any business about and then add _nothing_ unless you can find explicit policy documented somewhere in the ASF that says it is required. I was following examples from other projects and any docs I could find at the time, but this was back in '06. Not sure there is now either, this is likely just someone's opinion. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-2657: Attachment: LUCENE-2657.patch carrot2 dependency upgraded to 3.4.2: SOLR-2296 Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2500) A Linux-specific Directory impl that bypasses the buffer cache
[ https://issues.apache.org/jira/browse/LUCENE-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979588#action_12979588 ] Christian Kohlschütter commented on LUCENE-2500: I guess it would not be difficult to add Mac OS X support (via F_NOCACHE)? see http://evanjones.ca/write-latency-alignment.html A Linux-specific Directory impl that bypasses the buffer cache -- Key: LUCENE-2500 URL: https://issues.apache.org/jira/browse/LUCENE-2500 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2500.patch I've been testing how we could prevent Lucene's merges from evicting pages from the OS's buffer cache. I tried fadvise/madvise (via JNI) but (frustratingly), I could not get them to work (details at http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html). The only thing that worked was to use Linux's O_DIRECT flag, which forces all IO to bypass the buffer cache entirely... so I created a Linux-specific Directory impl to do this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979589#action_12979589 ] David Smiley commented on LUCENE-2611: -- I don't know why the vcs.xml change isn't working for you, but it's absolutely wonderful for the commit log history to show the JIRA references as links that work. FWIW, I'm using IntelliJ 10. I understand RE workspace.xml. IntelliJ IDEA and Eclipse setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x-part2.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming. The attached patches add a new top level directory {{dev-tools/}} with sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, as well as top-level ant targets named idea and eclipse that copy these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit run configuration per module is included. The Eclipse configuration includes a source entry for each source/test/resource location and classpath setup: a library entry for each jar. For IDEA, once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. For Eclipse, once {{ant eclipse}} has been run, the user has to refresh the project (right-click on the project and choose Refresh). If these patches is committed, Subversion svn:ignore properties should be added/modified to ignore the destination IDEA and Eclipse configuration locations. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch for IDEA: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2310) DocBuilder's getTimeElapsedSince Error
[ https://issues.apache.org/jira/browse/SOLR-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979593#action_12979593 ] Koji Sekiguchi commented on SOLR-2310: -- Good catch, tom. I'll commit shortly. DocBuilder's getTimeElapsedSince Error -- Key: SOLR-2310 URL: https://issues.apache.org/jira/browse/SOLR-2310 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4, 1.4.1 Environment: JDK1.6 Reporter: tom liu Assignee: Koji Sekiguchi Priority: Trivial Fix For: 3.1, 4.0 i has a job which runs about 65 hours, but the dataimport?command=status http requests returns 5 hours. in getTimeElapsedSince method of DocBuilder: {noformat} static String getTimeElapsedSince(long l) { l = System.currentTimeMillis() - l; return (l / (6 * 60)) % 60 + : + (l / 6) % 60 + : + (l / 1000) % 60 + . + l % 1000; } {noformat} the hours Compute is wrong, it mould be : {noformat} static String getTimeElapsedSince(long l) { l = System.currentTimeMillis() - l; return (l / (6 * 60)) + : + (l / 6) % 60 + : + (l / 1000) % 60 + . + l % 1000; } {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2310) DocBuilder's getTimeElapsedSince Error
[ https://issues.apache.org/jira/browse/SOLR-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-2310. -- Resolution: Fixed trunk: Committed revision 1057221. 3x: Committed revision 1057226. DocBuilder's getTimeElapsedSince Error -- Key: SOLR-2310 URL: https://issues.apache.org/jira/browse/SOLR-2310 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4, 1.4.1 Environment: JDK1.6 Reporter: tom liu Assignee: Koji Sekiguchi Priority: Trivial Fix For: 3.1, 4.0 i has a job which runs about 65 hours, but the dataimport?command=status http requests returns 5 hours. in getTimeElapsedSince method of DocBuilder: {noformat} static String getTimeElapsedSince(long l) { l = System.currentTimeMillis() - l; return (l / (6 * 60)) % 60 + : + (l / 6) % 60 + : + (l / 1000) % 60 + . + l % 1000; } {noformat} the hours Compute is wrong, it mould be : {noformat} static String getTimeElapsedSince(long l) { l = System.currentTimeMillis() - l; return (l / (6 * 60)) + : + (l / 6) % 60 + : + (l / 1000) % 60 + . + l % 1000; } {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979606#action_12979606 ] Ryan McKinley commented on LUCENE-2657: --- This is looking good! IIUC, this will be a parallel build system to ant. The build and test is independent of anything the ant build does. If we take this route, we should probably drop the -pom.xml.templates and the and --generate-maven-artifacts target. Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 3594 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/3594/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestThreadSafe.testLazyLoadThreadSafety Error Message: unable to create new native thread Stack Trace: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:614) at org.apache.lucene.search.TestThreadSafe.doTest(TestThreadSafe.java:133) at org.apache.lucene.search.TestThreadSafe.testLazyLoadThreadSafety(TestThreadSafe.java:152) at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:255) Build Log (for compile errors): [...truncated 8565 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979625#action_12979625 ] Jason Rutherglen commented on LUCENE-2324: -- bq. Any segs already flushed remain flushed (but not committed). Any segs not yet flushed remain not yet flushed... If the segment are flushed, then they will be deleted? Or they will be made available in a subsequent and completely successful commit? Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979631#action_12979631 ] Jason Rutherglen commented on LUCENE-2312: -- {quote}Is that right that future RT readers are no longer immutable snapshots (in a sense that they have variable maxDoc)?{quote} The RT readers'll be point-in-time. There are many mechanisms to make this happen that mainly revolve around a static maxDoc per reader while allowing some of the underlying data structures to change during indexing. There are two overall design issues right now and that is how to handle norms and the system.arraycopy per getReader to create static read only parallel upto arrays. I think system.arraycopy should be fast enough given it's a native instruction on Intel. And for norms we may need to relax their accuracy in order to create less garbage. That would involve either using a byte[][] for point-in-timeness or a byte[] that is recalculated only as it's grown (meaning newer readers created since the last array growth may see a slightly inaccurate norm value). The norm byte[] would essentially be grown every N docs. Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979633#action_12979633 ] Michael Busch commented on LUCENE-2312: --- bq. I believe the goal for RT readers is still point in time reader semantics. True. At twitter our RT solution also guarantees point-in-time readers (with one exception; see below). We have to provide at least a fixed macDoc per-query to guarantee consistency across terms (posting lists). Eg. imagine your query is 'a AND NOT b'. Say a occurs in doc 100. Now you don't find a posting in b's posting list for doc 100. Did doc 100 not have term b, or is doc 100 still being processed and that particular posting hasn't been written yet? If the reader's maxDoc however is set to 99 (the last completely indexed document) you can't get into this situation. Before every query we reopen the readers, which effectively simply updates the maxDoc. The one exception to point-in-time-ness are the df values in the dictionary, which for obvious reasons is tricky. I think a straightforward way to solve this problem is to count the df by iterating the corresponding posting list when requested. We could add a special counting method that just uses the skip lists to perform this task. Here the term buffer becomes even more important, and also documenting that docFreq() can be expensive in RT mode, ie. not O(1) as in non-RT mode, but rather O(log indexSize) in case we can get multi-level skip lists working in RT. Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979635#action_12979635 ] Steven Rowe commented on LUCENE-2657: - bq. IIUC, this will be a parallel build system to ant. The build and test is independent of anything the ant build does. Yes, except that the two systems share build output directories. bq. If we take this route, we should probably drop the -pom.xml.templates and the and --generate-maven-artifacts target. I agree. The -pom.xml.templates have never been fully correct (e.g. missing dependencies) and are unmaintained. I'm working on replacing generate-maven-artifacts. Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979637#action_12979637 ] Steven Rowe commented on LUCENE-2611: - bq. I've used the copyright plugin a lot and its a great way to ensure that the ASL is added to any new files. Might be useful to add it to reduce the hassle for new contributors. OK, I'll investigate. IntelliJ IDEA and Eclipse setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x-part2.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming. The attached patches add a new top level directory {{dev-tools/}} with sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, as well as top-level ant targets named idea and eclipse that copy these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit run configuration per module is included. The Eclipse configuration includes a source entry for each source/test/resource location and classpath setup: a library entry for each jar. For IDEA, once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. For Eclipse, once {{ant eclipse}} has been run, the user has to refresh the project (right-click on the project and choose Refresh). If these patches is committed, Subversion svn:ignore properties should be added/modified to ignore the destination IDEA and Eclipse configuration locations. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch for IDEA: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979643#action_12979643 ] Steven Rowe commented on LUCENE-2611: - bq. I don't know why the vcs.xml change isn't working for you, but it's absolutely wonderful for the commit log history to show the JIRA references as links that work. I agree, that would be nice. bq. FWIW, I'm using IntelliJ 10. I'm running both ATM, in part to insure that the configuration provided by this issue works under both IntelliJ 9 and 10. I haven't tried the log comment issue auto-linkification yet under IntelliJ 10. I *do* see auto-linkified issue IDs in the Repository Changes view, as well as in the [Version Control | Subversion | Show History] view in IntelliJ 9 - very nice! (Just not in the log comment editor or in the svnbar plugin's SVN Diff popup.) IntelliJ IDEA and Eclipse setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x-part2.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming. The attached patches add a new top level directory {{dev-tools/}} with sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, as well as top-level ant targets named idea and eclipse that copy these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit run configuration per module is included. The Eclipse configuration includes a source entry for each source/test/resource location and classpath setup: a library entry for each jar. For IDEA, once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. For Eclipse, once {{ant eclipse}} has been run, the user has to refresh the project (right-click on the project and choose Refresh). If these patches is committed, Subversion svn:ignore properties should be added/modified to ignore the destination IDEA and Eclipse configuration locations. Iam Jambour has written up on the Lucene wiki a detailed set of instructions for applying the 3.X branch patch for IDEA: http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979646#action_12979646 ] Jason Rutherglen commented on LUCENE-2312: -- {quote}The one exception to point-in-time-ness are the df values in the dictionary, which for obvious reasons is tricky.{quote} Right, forgot about those. I think we'd planned on using a multi-dimensional array, eg int[][]. However we'd need to test how they'll affect indexing performance. If that doesn't work then we'll need to think about other solutions like building them on demand, which is offloading the problem somewhere else. It looks like docFreq is used only for phrase queries? However I think paying a potentially small penalty during indexing (only when RT is on) is better than a somewhat random penalty during querying. Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979649#action_12979649 ] Michael Busch commented on LUCENE-2324: --- bq. Longer term c) would be great, or, if IW has an ES then it'd send multiple flush jobs to the ES. Lost in abbreviations :) - Can you remind me what 'ES' is? bq. But, you're right: maybe we should sometimes prune DWPTs. Or simply stop recycling any RAM, so that a just-flushed DWPT is an empty shell. I'm not sure I understand what the problem here with recycling RAM is. Could someone elaborate? Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979654#action_12979654 ] Michael Busch commented on LUCENE-2324: --- bq. I think aborting a flush should only lose the docs in that one DWPT (as it is today). Yeah I'm convinced now I don't want the nuke the world approach. Btw, Mike, you're very good with giving things intuitive names :) bq. I think on commit if we hit an aborting exception flushing a given DWPT, we throw it then there. Yes sounds good. {quote} bq. Any segs already flushed remain flushed (but not committed). Any segs not yet flushed remain not yet flushed... If the segment are flushed, then they will be deleted? Or they will be made available in a subsequent and completely successful commit? {quote} The aborting exception might be thrown due to a disk-full situation. This can be fixed and commit() called again, which then would flush the remaining DWPTs and commit all flushed segments. Otherwise, those flushed segments will be orphaned and deleted sometime later by a different IW because they don't belong to any SegmentInfos. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979655#action_12979655 ] Jason Rutherglen commented on LUCENE-2324: -- bq. Lost in abbreviations - Can you remind me what 'ES' is? I read it as ExecutorService, ie, a thread pool. bq. I'm not sure I understand what the problem here with recycling RAM is. Could someone elaborate? Mainly that we could have DWPT(s) lying around unused, consuming [recycled] RAM, eg, from a sudden drop in the number of incoming threads after a flush. This is a drop the code, and put it back in if that was a bad idea solution. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2307) PHPSerialized fails with sharded queries
[ https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio Verni updated SOLR-2307: Attachment: TestPHPSerializedResponseWriter.java PHPSerializedResponseWriter.java.patch Patch update to handle single values in multi valued fields. Added a junit test case just to test the issue. PHPSerialized fails with sharded queries Key: SOLR-2307 URL: https://issues.apache.org/jira/browse/SOLR-2307 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 1.3, 1.4.1 Reporter: Antonio Verni Priority: Minor Attachments: PHPSerializedResponseWriter.java.patch, PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java Solr throws a java.lang.IllegalArgumentException: Map size must not be negative exception when using the PHP Serialized response writer with sharded queries. To reproduce the issue start your preferred example and try the following query: http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr It is caused by the JSONWriter implementation of writeSolrDocumentList and writeSolrDocument. Overriding this two methods in the PHPSerializedResponseWriter to handle the SolrDocument size seems to solve the issue. Attached my patch made against trunk rev 1055588. cheers, Antonio -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979671#action_12979671 ] Michael Busch commented on LUCENE-2324: --- {quote} Mainly that we could have DWPT(s) lying around unused, consuming [recycled] RAM, eg, from a sudden drop in the number of incoming threads after a flush. This is a drop the code, and put it back in if that was a bad idea solution. {quote} Ah thanks, got it. bq. Or simply stop recycling any RAM, so that a just-flushed DWPT is an empty shell. +1 Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2307) PHPSerialized fails with sharded queries
[ https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979683#action_12979683 ] Ahmet Arslan commented on SOLR-2307: This patch solves this problem http://search-lucene.com/m/lr7t42uWp4g , right? PHPSerialized fails with sharded queries Key: SOLR-2307 URL: https://issues.apache.org/jira/browse/SOLR-2307 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 1.3, 1.4.1 Reporter: Antonio Verni Priority: Minor Attachments: PHPSerializedResponseWriter.java.patch, PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java Solr throws a java.lang.IllegalArgumentException: Map size must not be negative exception when using the PHP Serialized response writer with sharded queries. To reproduce the issue start your preferred example and try the following query: http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr It is caused by the JSONWriter implementation of writeSolrDocumentList and writeSolrDocument. Overriding this two methods in the PHPSerializedResponseWriter to handle the SolrDocument size seems to solve the issue. Attached my patch made against trunk rev 1055588. cheers, Antonio -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979700#action_12979700 ] Michael McCandless commented on LUCENE-2474: I started to implement the forwards to all subs and to all reopened readers and... it's kinda hairy. I mean there are TONS of places where we make new readers (Earwin's working on improving this, I think, under LUCENE-2355). So then I wondered: what if we just make this a static method, eg on IndexReader, add/removeReaderFinishedListener? (Or we could put it on FieldCache). That'd be a tiny change... Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Closed: (LUCENE-2851) Highlighting in UTF-8 documents
[ https://issues.apache.org/jira/browse/LUCENE-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maricris Villareal closed LUCENE-2851. -- Resolution: Not A Problem Sorry, it turned out that the issue was in the way I was opening the file. BufferedReader br = new BufferedReader(new FileReader(pageFile)); was changed to BufferedReader br = new BufferedReader(new java.io.InputStreamReader(new java.io.FileInputStream(pageFile), UTF-8)); and it worked. Highlighting in UTF-8 documents --- Key: LUCENE-2851 URL: https://issues.apache.org/jira/browse/LUCENE-2851 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.9.3 Reporter: Maricris Villareal When I try to highlight a Chinese document using org.apache.lucene.search.highlight.Highlighter, I end up with a corrupted document from getBestTextFragments(). This corruption happens both when I try to highlight a pure English query or when I try to highlight a pure Chinese query. I believe that this issue is related to LUCENE-1500 which was closed by preventing an exception from being thrown but did not fix the underlying problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979713#action_12979713 ] Michael McCandless commented on LUCENE-2324: {quote} bq. Lost in abbreviations - Can you remind me what 'ES' is? I read it as ExecutorService, ie, a thread pool. {quote} Yes, sorry that's what I meant. Ie someday IW can take an ES too and farm things out to it when it could make use of concurrency (like flush the world). But that's for later /dream. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2307) PHPSerialized fails with sharded queries
[ https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979715#action_12979715 ] Antonio Verni commented on SOLR-2307: - yes, exactly, and it could also fix SOLR-2278 but I didn't tested it. As I discovered reading your SOLR-2291, the current patch I uploaded does not respect returnFields parameter. I've fixed it but I need to upload a third version of the patch (so please, sorry the mess) and a new test file. In detail, to fix the issue I've added a numeric index to the response array of documents in writeSolrDocumentList as requested by the php serialization protocol and handled the field count in writeSolrDocument. PHPSerialized fails with sharded queries Key: SOLR-2307 URL: https://issues.apache.org/jira/browse/SOLR-2307 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 1.3, 1.4.1 Reporter: Antonio Verni Priority: Minor Attachments: PHPSerializedResponseWriter.java.patch, PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java Solr throws a java.lang.IllegalArgumentException: Map size must not be negative exception when using the PHP Serialized response writer with sharded queries. To reproduce the issue start your preferred example and try the following query: http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr It is caused by the JSONWriter implementation of writeSolrDocumentList and writeSolrDocument. Overriding this two methods in the PHPSerializedResponseWriter to handle the SolrDocument size seems to solve the issue. Attached my patch made against trunk rev 1055588. cheers, Antonio -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2795) Genericize DirectIOLinuxDir - UnixDir
[ https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979719#action_12979719 ] Michael McCandless commented on LUCENE-2795: https://issues.apache.org/jira/browse/LUCENE-2500?focusedCommentId=12979588page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12979588 has details on what flags to pass to OS X to bypass its buffer cache... Genericize DirectIOLinuxDir - UnixDir -- Key: LUCENE-2795 URL: https://issues.apache.org/jira/browse/LUCENE-2795 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Michael McCandless Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to use it for indexWriter and not IndexReader (searching). It's a trap. But, once we do LUCENE-2793, we can make it fully general purpose because then a single native Dir impl can be used. I'd also like to make it generic to other Unices, if we can, so that it becomes UnixDirectory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2500) A Linux-specific Directory impl that bypasses the buffer cache
[ https://issues.apache.org/jira/browse/LUCENE-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979718#action_12979718 ] Michael McCandless commented on LUCENE-2500: Nice! Actually I'd like to generalize this Dir impl to be a UnixFSDirectory (adding ifdefs to handle the flags for the various flavors), and, fix it, once we have IOContext, to properly decide when to use direct IO and when not to. This way it's safe to just use on any Unix platform... (see LUCENE-2795). A Linux-specific Directory impl that bypasses the buffer cache -- Key: LUCENE-2500 URL: https://issues.apache.org/jira/browse/LUCENE-2500 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2500.patch I've been testing how we could prevent Lucene's merges from evicting pages from the OS's buffer cache. I tried fadvise/madvise (via JNI) but (frustratingly), I could not get them to work (details at http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html). The only thing that worked was to use Linux's O_DIRECT flag, which forces all IO to bypass the buffer cache entirely... so I created a Linux-specific Directory impl to do this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979720#action_12979720 ] Jason Rutherglen commented on LUCENE-2474: -- {quote}make this a static method, eg on IndexReader, add/removeReaderFinishedListener? (Or we could put it on FieldCache). That'd be a tiny change...{quote} This makes the most sense however it feels temporary as should should probably move to a unified IWC/IRC config where all parameters are set and shared for writers and readers? This way we can eventually coordinate things like IO scheduling, eg, LUCENE-2793's IOContext. Also shouldn't there simply be a reader event listener and perhaps even a writer event listener? Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2307) PHPSerialized fails with sharded queries
[ https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio Verni updated SOLR-2307: Attachment: TestPHPSerializedResponseWriter.java PHPSerializedResponseWriter.java.patch The previous implementation was not respecting the returnField parameter. Changes reflected in test code PHPSerialized fails with sharded queries Key: SOLR-2307 URL: https://issues.apache.org/jira/browse/SOLR-2307 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 1.3, 1.4.1 Reporter: Antonio Verni Priority: Minor Attachments: PHPSerializedResponseWriter.java.patch, PHPSerializedResponseWriter.java.patch, PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java, TestPHPSerializedResponseWriter.java Solr throws a java.lang.IllegalArgumentException: Map size must not be negative exception when using the PHP Serialized response writer with sharded queries. To reproduce the issue start your preferred example and try the following query: http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr It is caused by the JSONWriter implementation of writeSolrDocumentList and writeSolrDocument. Overriding this two methods in the PHPSerializedResponseWriter to handle the SolrDocument size seems to solve the issue. Attached my patch made against trunk rev 1055588. cheers, Antonio -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979728#action_12979728 ] Jason Rutherglen commented on LUCENE-2793: -- Shall I take this one? With the plan being to add config options to IWC so that IW uses the DirectIOLinuxDirectory (and it's variants) only for merging? Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Michael McCandless Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979731#action_12979731 ] Michael McCandless commented on LUCENE-2793: Yes please! But, this issue only adds the IOContext, threading it down to when you open an input / create an output. That context should hold enough information to allow the Dir impl to make decisions like buffer sizes and avoiding buffer cache, etc. Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Michael McCandless Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-975) admin-extra.html not currectly display when using multicore configuration
[ https://issues.apache.org/jira/browse/SOLR-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979730#action_12979730 ] Edward Rudd commented on SOLR-975: -- I have confirmed this issue is fixed in the 4.0 nightly build from today. admin-extra.html not currectly display when using multicore configuration - Key: SOLR-975 URL: https://issues.apache.org/jira/browse/SOLR-975 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 1.4 Environment: Jetty openjdk 1.6.0 1.0.b12 (EPEL package for EL5) Reporter: Edward Rudd I'm having cross-talk issues with using the Solr nightlies (and probably w/ 1.3.0 release but have not tested as I needed newer features of the DataImportHandler in the nightlies) Basic scenario for this bug is as follows I have two cores configured and BOTH have a customized admin-extra.html, however going to the admin pages uses the SAME admin-extra.html for all cores. the one used is whichever core is browsed first..This looks like a caching bug where the cache is not taking into account the Core. Basically my admin-extra.html has a link to the data importer script and a link to reload the core (which has to have the core name explicitly in the per-core admin-extra.html). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979733#action_12979733 ] Michael McCandless commented on LUCENE-2474: I think we can generalize this to any event and to writer in the future... for today, just letting something external be notified when a reader is gone, just as FieldCache is privately notified today, is a good baby step. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979736#action_12979736 ] Jason Rutherglen commented on LUCENE-2793: -- {quote}this issue only adds the IOContext, threading it down to when you open an input / create an output{quote} Does this mean we're not implementing this part? {quote}This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different - we could in theory share del docs, norms, etc, if that were somehow possible.{quote} Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Michael McCandless Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2307) PHPSerialized fails with sharded queries
[ https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979737#action_12979737 ] Ahmet Arslan commented on SOLR-2307: By the way, I think you should call req.close(); at the end the test. PHPSerialized fails with sharded queries Key: SOLR-2307 URL: https://issues.apache.org/jira/browse/SOLR-2307 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 1.3, 1.4.1 Reporter: Antonio Verni Priority: Minor Attachments: PHPSerializedResponseWriter.java.patch, PHPSerializedResponseWriter.java.patch, PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java, TestPHPSerializedResponseWriter.java Solr throws a java.lang.IllegalArgumentException: Map size must not be negative exception when using the PHP Serialized response writer with sharded queries. To reproduce the issue start your preferred example and try the following query: http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr It is caused by the JSONWriter implementation of writeSolrDocumentList and writeSolrDocument. Overriding this two methods in the PHPSerializedResponseWriter to handle the SolrDocument size seems to solve the issue. Attached my patch made against trunk rev 1055588. cheers, Antonio -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979738#action_12979738 ] Michael McCandless commented on LUCENE-2793: Sorry, I think we should also do that as part of this issue. Basically the IOContext needs to become part of the cache key uses in IW's ReaderPool? Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Michael McCandless Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2856) Create IndexWriter event listener, specifically for merges
Create IndexWriter event listener, specifically for merges -- Key: LUCENE-2856 URL: https://issues.apache.org/jira/browse/LUCENE-2856 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen The issue will allow users to monitor merges occurring within IndexWriter using a callback notifier event listener. This can be used by external applications such as Solr to monitor large segment merges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979740#action_12979740 ] Jason Rutherglen commented on LUCENE-2793: -- bq. Basically the IOContext needs to become part of the cache key uses in IW's ReaderPool? Great, I'll implement this. Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Michael McCandless Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2857) Fix various problems with PulsingCodec
Fix various problems with PulsingCodec -- Key: LUCENE-2857 URL: https://issues.apache.org/jira/browse/LUCENE-2857 Project: Lucene - Java Issue Type: Improvement Components: Codecs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2857) Fix various problems with PulsingCodec
[ https://issues.apache.org/jira/browse/LUCENE-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2857: --- Attachment: LUCENE-2857.patch Patch. I changed PulsingCodec to: * Not use absurd RAM when cloning TermState * Don't decode the byte[] entry in the terms dict until docs/positions enum is needed * Use total TF (number of term positions across all docs) as the threshold for storing in terms dict vs wrapped codec This fixes the intermittent failure in TestIndexWriterOnJRECrash.testNRTThreads that we've seen lately. Fix various problems with PulsingCodec -- Key: LUCENE-2857 URL: https://issues.apache.org/jira/browse/LUCENE-2857 Project: Lucene - Java Issue Type: Improvement Components: Codecs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2857.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2829. Resolution: Fixed improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Assignee: Michael McCandless Fix For: 3.1 Attachments: LUCENE-2829.patch, LUCENE-2829.patch, LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENENET-387) Not able to sort by Title
Not able to sort by Title - Key: LUCENENET-387 URL: https://issues.apache.org/jira/browse/LUCENENET-387 Project: Lucene.Net Issue Type: Bug Environment: Lucene.net incorporated in Sitecore CMS 6.2, OS : windows server 2008, IE 8 Reporter: Reshmi Kumari Sorting by date is working perfectly fine but not able to sort by Title. I have indexed Title but not tokenized. (field target=sortTitle indexType=untokenizedTitle/field). for sorting I have used: sort = new Sort(new SortField(sortTitle, SortField.STRING)); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Lucene-trunk - Build # 1421 - Failure
OK I got TestNRTThreads (alone, no crashing) to fail, once I added a CheckIndex to it (and ran under while(1)). So this is not particular to crashing... it's just because PulsingCodec was using crazy RAM on cloning its TermState. I fixed this in LUCENE-2857. Mike On Mon, Jan 10, 2011 at 6:26 AM, Michael McCandless luc...@mikemccandless.com wrote: OK, so this looks be caused by 1) the fact that we are indexing Greek stop words with the TestNRTThreads test, and 2) Pulsing codec is horribly inefficient in how it handles pulsed terms that have many many positions. But it's odd we've only hit this failure in the JRE crash test... I've added a CheckIndex to TestNRTThreads itself and I'll see if that too can provoke an OOME. It should. Mike On Mon, Jan 10, 2011 at 5:14 AM, Michael McCandless luc...@mikemccandless.com wrote: OK I snatched this index and indeed I can reproduce the OOME during CheckIndex! The hunt begins :) Mike On Sun, Jan 9, 2011 at 10:35 PM, Robert Muir rcm...@gmail.com wrote: On Sun, Jan 9, 2011 at 9:40 PM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1421/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed maybe this is specific to pulsing? I noticed its failed 3 times with this identical pulsing stacktrace: Lucene-trunk/1421, tests-only/3590, tests-only/3570 However, this time it failed in a nightly build (perhaps the indexes are still available on the hudson machine if we salvage before the next nightly build?) it should be under lucene/build/test/N/jrecrashXXtmp/ all 3 times the stacktrace is: test: terms, freq, prox...ERROR [Java heap space] java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Position.clone(PulsingPostingsWriterImpl.java:104) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Document.clone(PulsingPostingsWriterImpl.java:74) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingTermState.clone(PulsingPostingsReaderImpl.java:72) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingDocsEnum.reset(PulsingPostingsReaderImpl.java:234) at org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl.docs(PulsingPostingsReaderImpl.java:189) at org.apache.lucene.index.codecs.PrefixCodedTermsReader$FieldReader$SegmentTermsEnum.docs(PrefixCodedTermsReader.java:515) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:756) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:489) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:83) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 3634 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3634/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:87) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131) at org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137) at org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049) Build Log (for compile errors): [...truncated 3053 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port tool
The amount of custom work required for the conversion is starting to concern me a bit. Well, to clarify, the work itself doesn't concern me, but rather I'm worried that this is going to make a purely automated conversion process very difficult to pull off and probably very fragile. The devil is definitely in the details. What are thoughts concerning how we can begin to tackle this? How many of these issues can be handled by Sharpen, or a modified, custom version of Sharpen? What items are best handled by a pre/post processor? A number of the items DIGY listed (thanks!) seem to fall under the scope of code intent, vs pure syntactical mapping. I'd suggest that it's unrealistic to expect any conversion tool to manage those types of issues. Perhaps a process such as the following should be our initial draft: 1) Start with Lucene.Java source, initially the latest 3.0.3 release. 2) Make specific hand coded changes to the java source code to assist with certain automated conversion issues. These changes should be expressed as a set of patch files, to be automatically applied to the java source on subsequent iterations of this process. Any patch rejections should break the build. These patches should be maintained as new code updates come from the java source. 3) Run an automated conversion tool (Sharpen most likely.) 4) Perform any desired post processing to modify the source code structure, setup project / solution files, etc. Essentially, get the project into a state that it's loadable by Visual Studio. At this point there will be errors (lots of them.) The output of this step should be checked in as the raw conversion source. 5) Make changes to the converted C# code, including necessary helper classes, in order to fix all the remaining issues alluded to by DIGY. Also, run any automated post processing, such as Resharper code formatting (the formatting settings should be standardized across the project to ensure normalized and repeatable refactorings), inline docs tweaks, etc. These changes should also be expressed as a set of patch files, to be automatically applied to the raw conversion source on subsequent iterations of this process. Any patch rejections should break the build. These patches should represent the bulk of the efforts of the Lucene.Net core dev team. The output of this step should be checked in as the official Lucene.Net source code. This entire process needs to be checked into a conversion process branch. After the initial build of this system, workflow would be split into the following 2 vectors: A) On java source changes (probably at a courser level than individual commits,) steps 1-4 would be run to build a new base raw conversion source. With the java changes, it's possible that changes to the patch files in step 2 would be required. Then step 5 would be run to create the official Lucene.Net source. Again, fixes to the patches may be in order depending on the complexity of the original java changes B) Most other changes would be considered C#-side specific. This might involve platform specific bug fixes, desired code refactorings, etc. These changes would be made based on the current checked in Lucene.Net source, and the patch files for step 5 would be updated to reflect those changes. Conversion process changes would fall outside the scope of standard development, being fairly disruptive. Of course, this process does complicate the development / maintenance process quite a bit, by making many more vectors of change. And, I'm aware that what I've blathered on about here has probably already been discussed, but I wanted to get some discussion going. Thoughts? Peter Mateja peter.mat...@gmail.com On Sun, Jan 9, 2011 at 4:09 PM, Digy digyd...@gmail.com wrote: Having a buildable clean code is just a beginning and should not result in lost of know-hows. Before trying to fix the bugs of the output of these tools, everyone should see how they were fixed in Lucene.Net 2.9.2. There is no need to reinvent the wheel. Here is a quick list of tips tricks as far as I can remember. * Decimal separator is not always ., some locales use , (while parsing float/double). * Set in Java accepts null as argument. A null-control is needed while porting. * ReadResolve should be ported by implementing the interface System.Runtime.Serialization.IObjectReference public Object GetRealObject(System.Runtime.Serialization.StreamingContext context) { return ReadResolve(); } * .NET emits \ufffd as invalid char but java as \x00 * Use StringComparer.Ordinal while comparing strings. * FIPS compliance. use SHA1 instead of MD5 * Use System.Runtime.Serialization.OnDeserialized attribute on Serializable classes. void OnDeserialized(System.Runtime.Serialization.StreamingContext context) { - } * Use System.IO.Path.DirectorySeparatorChar or Path.Combine instead of using \\. (causes problems on
[jira] Updated: (SOLR-2307) PHPSerialized fails with sharded queries
[ https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio Verni updated SOLR-2307: Comment: was deleted (was: The previous implementation was not respecting the returnField parameter. Changes reflected in test code) PHPSerialized fails with sharded queries Key: SOLR-2307 URL: https://issues.apache.org/jira/browse/SOLR-2307 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 1.3, 1.4.1 Reporter: Antonio Verni Priority: Minor Attachments: PHPSerializedResponseWriter.java.patch, PHPSerializedResponseWriter.java.patch, PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java, TestPHPSerializedResponseWriter.java Solr throws a java.lang.IllegalArgumentException: Map size must not be negative exception when using the PHP Serialized response writer with sharded queries. To reproduce the issue start your preferred example and try the following query: http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr It is caused by the JSONWriter implementation of writeSolrDocumentList and writeSolrDocument. Overriding this two methods in the PHPSerializedResponseWriter to handle the SolrDocument size seems to solve the issue. Attached my patch made against trunk rev 1055588. cheers, Antonio -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2837. Resolution: Fixed 3rd time's a charm? Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher --- Key: LUCENE-2837 URL: https://issues.apache.org/jira/browse/LUCENE-2837 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2837.patch, LUCENE-2837.patch, LUCENE-2837.patch We've discussed cleaning up our *Searcher stack for some time... I think we should try to do this before releasing 4.0. So I'm attaching an initial patch which: * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher * Removes contrib/remote * Removes MultiSearcher * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now pass useThreads=true, or a custom ES to the ctor) The patch is rough -- I just ripped stuff out, did search/replace to IndexSearcher, etc. EG nothing is directly testing using threads with IndexSearcher, but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2474: --- Attachment: LUCENE-2474.patch OK, here's a patch exposing the readerFinishedListeners as static methods on IndexReader. It was also nice to consolidate all the various places we were previously purging the FieldCache. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch, LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-3.x - Build # 238 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/238/ All tests passed Build Log (for compile errors): [...truncated 21034 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2756) MultiSearcher.rewrite() incorrectly rewrites queries
[ https://issues.apache.org/jira/browse/LUCENE-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2756. Resolution: Fixed Fix Version/s: 4.0 3.1 MultiSearcher is now deprecated/removed. MultiSearcher.rewrite() incorrectly rewrites queries Key: LUCENE-2756 URL: https://issues.apache.org/jira/browse/LUCENE-2756 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2756_testcase.patch This was reported on the userlist, in the context of range queries. Its also easy to make our existing tests fail with my patch on LUCENE-2751: {noformat} ant test-core -Dtestcase=TestBoolean2 -Dtestmethod=testRandomQueries -Dtests.seed=7679849347282878725:-903778383189134045 {noformat} The fundamental problem is that MultiSearcher first rewrites against individual subs, then uses Query.combine() which simply OR's these sub-clauses. This is incorrect for expanded MUST_NOT queries (e.g. from wildcard), as it violates demorgan's law. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979888#action_12979888 ] Earwin Burrfoot commented on LUCENE-2474: - bq. Earwin's working on improving this, I think, under LUCENE-2355 I stalled, and then there were just so many changes under trunk, so I have to restart now :) Thanks for another kick. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Attachments: LUCENE-2474.patch, LUCENE-2474.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2855) Contrib queryparser should not use CharSequence as Map key
[ https://issues.apache.org/jira/browse/LUCENE-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani resolved LUCENE-2855. -- Resolution: Fixed patch applied on revision 1057454 Contrib queryparser should not use CharSequence as Map key -- Key: LUCENE-2855 URL: https://issues.apache.org/jira/browse/LUCENE-2855 Project: Lucene - Java Issue Type: Bug Components: contrib/* Affects Versions: 3.0.3 Reporter: Adriano Crestani Assignee: Adriano Crestani Fix For: 3.0.4 Attachments: lucene_2855_adriano_crestani_2011_01_08.patch, lucene_2855_adriano_crestani_2011_01_09.patch Today, contrib query parser uses MapCharSequence,... in many different places, which may lead to problems, since CharSequence interface does not enforce the implementation of hashcode and equals methods. Today, it's causing a problem with QueryTreeBuilder.setBuilder(CharSequence,QueryBuilder) method, that does not works as expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-trunk - Build # 1422 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1422/ All tests passed Build Log (for compile errors): [...truncated 16681 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr-3.x - Build # 224 - Failure
Build: https://hudson.apache.org/hudson/job/Solr-3.x/224/ All tests passed Build Log (for compile errors): [...truncated 20277 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org