[jira] Commented: (LUCENE-1632) boolean docid set iterator improvement
[ https://issues.apache.org/jira/browse/LUCENE-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708659#action_12708659 ] John Wang commented on LUCENE-1632: --- I think we have an improvement for ConjuctionScorer as well with about 10% improvement. We need to clean it up for a patch. To make this clear, these are not algorithmic changes, there are code tuning work performed on the same algorithm. The naming is used to be consistent with the current Lucene class names, e.g. DocIdSet, DocIdSetIterator. Feel free to do more code tuning if you feel it would improve performance further. > boolean docid set iterator improvement > -- > > Key: LUCENE-1632 > URL: https://issues.apache.org/jira/browse/LUCENE-1632 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.4 >Reporter: John Wang > Attachments: Lucene-1632-patch.txt > > > This was first brought up in Lucene-1345. But Lucene-1345 conversation has > digressed. As per suggested, creating a separate issue to track. > Added perf comparisons with boolean set iterators with current scorers > See patch > System: Ubunto, > java version "1.6.0_11" > Intel core2 Duo 2.44ghz > new milliseconds=470 > new milliseconds=534 > new milliseconds=450 > new milliseconds=443 > new milliseconds=444 > new milliseconds=445 > new milliseconds=449 > new milliseconds=441 > new milliseconds=444 > new milliseconds=445 > new total milliseconds=4565 > old milliseconds=529 > old milliseconds=491 > old milliseconds=428 > old milliseconds=549 > old milliseconds=427 > old milliseconds=424 > old milliseconds=420 > old milliseconds=424 > old milliseconds=423 > old milliseconds=422 > old total milliseconds=4537 > New/Old Time 4565/4537 (100.61715%) > OrDocIdSetIterator milliseconds=1138 > OrDocIdSetIterator milliseconds=1106 > OrDocIdSetIterator milliseconds=1065 > OrDocIdSetIterator milliseconds=1066 > OrDocIdSetIterator milliseconds=1065 > OrDocIdSetIterator milliseconds=1067 > OrDocIdSetIterator milliseconds=1072 > OrDocIdSetIterator milliseconds=1118 > OrDocIdSetIterator milliseconds=1065 > OrDocIdSetIterator milliseconds=1069 > OrDocIdSetIterator total milliseconds=10831 > DisjunctionMaxScorer milliseconds=1914 > DisjunctionMaxScorer milliseconds=1981 > DisjunctionMaxScorer milliseconds=1861 > DisjunctionMaxScorer milliseconds=1893 > DisjunctionMaxScorer milliseconds=1886 > DisjunctionMaxScorer milliseconds=1885 > DisjunctionMaxScorer milliseconds=1887 > DisjunctionMaxScorer milliseconds=1889 > DisjunctionMaxScorer milliseconds=1891 > DisjunctionMaxScorer milliseconds=1888 > DisjunctionMaxScorer total milliseconds=18975 > Or/DisjunctionMax Time 10831/18975 (57.080368%) > OrDocIdSetIterator milliseconds=1079 > OrDocIdSetIterator milliseconds=1075 > OrDocIdSetIterator milliseconds=1076 > OrDocIdSetIterator milliseconds=1093 > OrDocIdSetIterator milliseconds=1077 > OrDocIdSetIterator milliseconds=1074 > OrDocIdSetIterator milliseconds=1078 > OrDocIdSetIterator milliseconds=1075 > OrDocIdSetIterator milliseconds=1074 > OrDocIdSetIterator milliseconds=1074 > OrDocIdSetIterator total milliseconds=10775 > DisjunctionSumScorer milliseconds=1398 > DisjunctionSumScorer milliseconds=1322 > DisjunctionSumScorer milliseconds=1320 > DisjunctionSumScorer milliseconds=1305 > DisjunctionSumScorer milliseconds=1304 > DisjunctionSumScorer milliseconds=1301 > DisjunctionSumScorer milliseconds=1304 > DisjunctionSumScorer milliseconds=1300 > DisjunctionSumScorer milliseconds=1301 > DisjunctionSumScorer milliseconds=1317 > DisjunctionSumScorer total milliseconds=13172 > Or/DisjunctionSum Time 10775/13172 (81.80231%) > AndDocIdSetIterator milliseconds=330 > AndDocIdSetIterator milliseconds=336 > AndDocIdSetIterator milliseconds=298 > AndDocIdSetIterator milliseconds=299 > AndDocIdSetIterator milliseconds=310 > AndDocIdSetIterator milliseconds=298 > AndDocIdSetIterator milliseconds=298 > AndDocIdSetIterator milliseconds=334 > AndDocIdSetIterator milliseconds=298 > AndDocIdSetIterator milliseconds=299 > AndDocIdSetIterator total milliseconds=3100 > ConjunctionScorer milliseconds=332 > ConjunctionScorer milliseconds=307 > ConjunctionScorer milliseconds=302 > ConjunctionScorer milliseconds=350 > ConjunctionScorer milliseconds=300 > ConjunctionScorer milliseconds=304 > ConjunctionScorer milliseconds=305 > ConjunctionScorer milliseconds=303 > ConjunctionScorer milliseconds=303 > ConjunctionScorer milliseconds=299 > ConjunctionScorer total milliseconds=3105 > And/Conjunction Time 3100/3105 (99.83897%) > main contributors to the patch: Anmol Bhasin & Yasuhiro Matsuda -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr.
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708590#action_12708590 ] Paul Elschot commented on LUCENE-1410: -- A very recent paper with some improvements to PFOR: Yan, Ding, Suel, Inverted Index Compression and Query Processing with Optimized Document Ordering, WWW 2009, April 20-24 2009, Madrid, Spain Roughly quoting par. 4.2, Optimizing PForDelta compression: For an exception, we store its lower b bits instead of the offset to the next exception in its corresponding slot, while we store the higher overflow bits and the offset in two separate arrays. These two arrays are compressed using the Simple16 method. Also b is chosen to optimize decompression speed. This makes the dependence of b on the data quite simple, (in the PFOR above here this dependence is more complex) and this improves compression speed. Btw. the document ordering there is by URL. For many terms this gives more shorter delta's between doc ids allowing a higher decompression speed of the doc ids. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Other >Reporter: Paul Elschot >Priority: Minor > Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1313) Realtime Search
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708578#action_12708578 ] Jason Rutherglen commented on LUCENE-1313: -- I think the easiest way to handle the ram buf size vs. the ram dir size is the allow each to grow on request. I have some code I need to test that implements it. This way we're growing based on demand and availability. The only thing we may want to add is a way to grow and perhaps automatically flush based on the growth requested and perhaps prioritizing requests? > Realtime Search > --- > > Key: LUCENE-1313 > URL: https://issues.apache.org/jira/browse/LUCENE-1313 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Affects Versions: 2.4.1 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, > LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, > LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, > lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch > > > Realtime search with transactional semantics. > Possible future directions: > * Optimistic concurrency > * Replication > Encoding each transaction into a set of bytes by writing to a RAMDirectory > enables replication. It is difficult to replicate using other methods > because while the document may easily be serialized, the analyzer cannot. > I think this issue can hold realtime benchmarks which include indexing and > searching concurrently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
InstantiatedIndex Memory required
Hi So far I am using RAMDirectory for my indexes. To meet the SLA of our project, i thought of using InstantiatedIndex. But when I used that, i am not able to get any out put from that and its throwing out of memory error. What is the ratio between Index size and memory size, when using InstantiatedIndex. Here are my index details: Index size : 200mB RAM Size : 1 GB If i try with a small test index of size 100KB, its working. Please help me with this. Thanks Ravichandra -- View this message in context: http://www.nabble.com/InstantiatedIndex-Memory-required-tp23506231p23506231.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1634) LogMergePolicy should use the number of deleted docs when deciding which segments to merge
[ https://issues.apache.org/jira/browse/LUCENE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated LUCENE-1634: - Attachment: LUCENE-1634.patch I posted a patch. > LogMergePolicy should use the number of deleted docs when deciding which > segments to merge > -- > > Key: LUCENE-1634 > URL: https://issues.apache.org/jira/browse/LUCENE-1634 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Yasuhiro Matsuda > Attachments: LUCENE-1634.patch > > > I found that IndexWriter.optimize(int) method does not pick up large segments > with a lot of deletes even when most of the docs are deleted. And the > existence of such segments affected the query performance significantly. > I created an index with 1 million docs, then went over all docs and updated a > few thousand at a time. I ran optimize(20) occasionally. What saw were large > segments with most of docs deleted. Although these segments did not have > valid docs they remained in the directory for a very long time until more > segments with comparable or bigger sizes were created. > This is because LogMergePolicy.findMergeForOptimize uses the size of segments > but does not take the number of deleted documents into consideration when it > decides which segments to merge. So, a simple fix is to use the delete count > to calibrate the segment size. I can create a patch for this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1634) LogMergePolicy should use the number of deleted docs when deciding which segments to merge
LogMergePolicy should use the number of deleted docs when deciding which segments to merge -- Key: LUCENE-1634 URL: https://issues.apache.org/jira/browse/LUCENE-1634 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Yasuhiro Matsuda I found that IndexWriter.optimize(int) method does not pick up large segments with a lot of deletes even when most of the docs are deleted. And the existence of such segments affected the query performance significantly. I created an index with 1 million docs, then went over all docs and updated a few thousand at a time. I ran optimize(20) occasionally. What saw were large segments with most of docs deleted. Although these segments did not have valid docs they remained in the directory for a very long time until more segments with comparable or bigger sizes were created. This is because LogMergePolicy.findMergeForOptimize uses the size of segments but does not take the number of deleted documents into consideration when it decides which segments to merge. So, a simple fix is to use the delete count to calibrate the segment size. I can create a patch for this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1455) org.apache.lucene.ant.HtmlDocument creates a FileInputStream in its constructor that it doesn't close
[ https://issues.apache.org/jira/browse/LUCENE-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned LUCENE-1455: --- Assignee: Mark Miller > org.apache.lucene.ant.HtmlDocument creates a FileInputStream in its > constructor that it doesn't close > - > > Key: LUCENE-1455 > URL: https://issues.apache.org/jira/browse/LUCENE-1455 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 2.9 > > > A look through the jtidy source code doesn't show a close that i can find in > parse (seems to be standard that you close your own streams anyway), so this > looks like a small descriptor leak to me. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1598) While you could use a custom Sort Comparator source with remote searchable before, you can no longer do so with FieldComparatorSource
[ https://issues.apache.org/jira/browse/LUCENE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned LUCENE-1598: --- Assignee: Mark Miller > While you could use a custom Sort Comparator source with remote searchable > before, you can no longer do so with FieldComparatorSource > - > > Key: LUCENE-1598 > URL: https://issues.apache.org/jira/browse/LUCENE-1598 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 2.9 > > > FieldComparatorSource is not serializable, but can live on a SortField -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1633) Copy/Paste-Typo in toString() for SpanQueryFilter
[ https://issues.apache.org/jira/browse/LUCENE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved LUCENE-1633. -- Resolution: Fixed Fix Version/s: 2.9 Committed. > Copy/Paste-Typo in toString() for SpanQueryFilter > - > > Key: LUCENE-1633 > URL: https://issues.apache.org/jira/browse/LUCENE-1633 > Project: Lucene - Java > Issue Type: Bug >Reporter: Bernd Fondermann >Priority: Trivial > Fix For: 2.9 > > Attachments: fix_SpanQueryFilter_toString.patch > > >public String toString() { > -return "QueryWrapperFilter(" + query + ")"; > +return "SpanQueryFilter(" + query + ")"; >} > says it all. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1633) Copy/Paste-Typo in toString() for SpanQueryFilter
[ https://issues.apache.org/jira/browse/LUCENE-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Fondermann updated LUCENE-1633: - Attachment: fix_SpanQueryFilter_toString.patch > Copy/Paste-Typo in toString() for SpanQueryFilter > - > > Key: LUCENE-1633 > URL: https://issues.apache.org/jira/browse/LUCENE-1633 > Project: Lucene - Java > Issue Type: Bug >Reporter: Bernd Fondermann >Priority: Trivial > Attachments: fix_SpanQueryFilter_toString.patch > > >public String toString() { > -return "QueryWrapperFilter(" + query + ")"; > +return "SpanQueryFilter(" + query + ")"; >} > says it all. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1633) Copy/Paste-Typo in toString() for SpanQueryFilter
Copy/Paste-Typo in toString() for SpanQueryFilter - Key: LUCENE-1633 URL: https://issues.apache.org/jira/browse/LUCENE-1633 Project: Lucene - Java Issue Type: Bug Reporter: Bernd Fondermann Priority: Trivial public String toString() { -return "QueryWrapperFilter(" + query + ")"; +return "SpanQueryFilter(" + query + ")"; } says it all. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org