[
https://issues.apache.org/jira/browse/SOLR-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685203#comment-15685203
]
Michael Sun edited comment on SOLR-9764 at 11/22/16 12:18 AM:
--------------------------------------------------------------
Here are some single user test results for the amount of memory saved.
Setup: Solr with a collection alias mapping to two collections, each with 4
days of data.
Test: Restart Solr, run query with filter for last 7 days and collect memory
histogram on one server afterwards. The filter hits both collections, with one
match all and the other match partially.
Result (extracted from histogram)
|Patched|#BitDocSet instances|#MatchAllDocSet instances|bytes for [J|Saving|
|Y|2|2|10001833664|6.9M|
|N|4|0|10008701640| |
Validation:
The difference of bytes for long[] is 6867976 bytes (6.9M). That's the total
amount of memory saved by MatchAllDocSet for one query. Since there are 2
MatchedDocSet are used, each saves 3433988 (3.4M). The the other side, The core
under study has 27M documents, which requires a long[] at the size of 3.4M
(27M/8), which is aligned with the memory saved from histogram.
was (Author: michael.sun):
Here are some single user test results for the amount of memory saved.
Setup: Solr with a collection alias mapping to two collections, each with 4
days of data.
Test: Restart Solr, run query with filter for last 7 days and collect memory
histogram on one server afterwards. The filter hits both collections, with one
match all and the other match partially.
Result (extracted from histogram)
|Patched|#BitDocSet instances|#MatchAllDocSet instances|bytes for [J|
|Y|2|2|10001833664|
|N|4|0|10008701640|
Validation:
The difference of bytes for long[] is 6867976 bytes (6.9M). That's the total
amount of memory saved by MatchAllDocSet for one query. Since there are 2
MatchedDocSet are used, each saves 3433988 (3.4M). The the other side, The core
under study has 27M documents, which requires a long[] at the size of 3.4M
(27M/8), which is aligned with the memory saved from histogram.
> Design a memory efficient DocSet if a query returns all docs
> ------------------------------------------------------------
>
> Key: SOLR-9764
> URL: https://issues.apache.org/jira/browse/SOLR-9764
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Michael Sun
> Attachments: SOLR-9764.patch, SOLR-9764.patch, SOLR-9764.patch,
> SOLR_9764_no_cloneMe.patch
>
>
> In some use cases, particularly use cases with time series data, using
> collection alias and partitioning data into multiple small collections using
> timestamp, a filter query can match all documents in a collection. Currently
> BitDocSet is used which contains a large array of long integers with every
> bits set to 1. After querying, the resulted DocSet saved in filter cache is
> large and becomes one of the main memory consumers in these use cases.
> For example. suppose a Solr setup has 14 collections for data in last 14
> days, each collection with one day of data. A filter query for last one week
> data would result in at least six DocSet in filter cache which matches all
> documents in six collections respectively.
> This is to design a new DocSet that is memory efficient for such a use case.
> The new DocSet removes the large array, reduces memory usage and GC pressure
> without losing advantage of large filter cache.
> In particular, for use cases when using time series data, collection alias
> and partition data into multiple small collections using timestamp, the gain
> can be large.
> For further optimization, it may be helpful to design a DocSet with run
> length encoding. Thanks [~mmokhtar] for suggestion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]