[ https://issues.apache.org/jira/browse/SOLR-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881361#comment-15881361 ]
ASF subversion and git services commented on SOLR-9764: ------------------------------------------------------- Commit 05c17c9a516d8501b2dcce9b5910a3d0b5510bc4 in lucene-solr's branch refs/heads/master from [~yo...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=05c17c9 ] SOLR-9764: fix CHANGES entry > Design a memory efficient DocSet if a query returns all docs > ------------------------------------------------------------ > > Key: SOLR-9764 > URL: https://issues.apache.org/jira/browse/SOLR-9764 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Michael Sun > Assignee: Yonik Seeley > Fix For: 6.5, master (7.0) > > Attachments: SOLR_9764_no_cloneMe.patch, SOLR-9764.patch, > SOLR-9764.patch, SOLR-9764.patch, SOLR-9764.patch, SOLR-9764.patch, > SOLR-9764.patch, SOLR-9764.patch, SOLR-9764.patch > > > In some use cases, particularly use cases with time series data, using > collection alias and partitioning data into multiple small collections using > timestamp, a filter query can match all documents in a collection. Currently > BitDocSet is used which contains a large array of long integers with every > bits set to 1. After querying, the resulted DocSet saved in filter cache is > large and becomes one of the main memory consumers in these use cases. > For example. suppose a Solr setup has 14 collections for data in last 14 > days, each collection with one day of data. A filter query for last one week > data would result in at least six DocSet in filter cache which matches all > documents in six collections respectively. > This is to design a new DocSet that is memory efficient for such a use case. > The new DocSet removes the large array, reduces memory usage and GC pressure > without losing advantage of large filter cache. > In particular, for use cases when using time series data, collection alias > and partition data into multiple small collections using timestamp, the gain > can be large. > For further optimization, it may be helpful to design a DocSet with run > length encoding. Thanks [~mmokhtar] for suggestion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org