Re: JVM GC Issue
I think the below article explains it well: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html I was thinking that doc-Values need to be transitioned into JVM from the OS cache. Turns out that is not required as the docValues are loaded into the virtual address space by the OS. The JVM need not think about loading them into its own memory as it can just access the virtual memory as easily. The OS keeps track of whether the docValues should be loaded into memory (if their address is actually being accessed by the JVM) or they just keep lying on the disk. Thx - SG On Sun, Dec 3, 2017 at 12:02 PM, Shawn Heiseywrote: > On 12/2/2017 6:59 PM, S G wrote: > >> I am a bit curious on the docValues implementation. >> I understand that docValues do not use JVM memory and >> they make use of OS cache - that is why they are more performant. >> >> But to return any response from the docValues, the values in the >> docValues' column-oriented-structures would need to be brought >> into the JVM's memory. And that will then increase the pressure >> on the JVM's memory anyways. So how do docValues actually >> help from memory perspective? >> > > What I'm writing below is my understanding of docValues. If it turns out > that I've gotten any of it wrong, that is MY error, not Solr's. > > When there are no docValues, Solr must do something called "uninverting > the index" in order to satisfy certain operations -- primarily faceting, > grouping, and sorting. > > A Lucene index is an inverted index. This means that it is a big list of > terms, and then each of those entries is a second list that describes which > fields and documents have the term, as well as some other information like > positions. Uninverting the index is pretty efficient, but it does take > time. The uninverted index structure is a list of all terms for a specific > field. Then there's a second phase -- the info in the uninverted field is > read and processed for the query operation, which will use heap. I do not > know if there are additional phases. There might be. > > In case you don't know, in the Lucene index, docValues data on disk > consists of every entry in the index for one field, written sequentially in > an uncompressed format. > > This means that for those query types, docValues is *exactly* what Solr > needs for the first phase. And instead of generating it into heap memory > and then reading it, Solr can just read the data right off the disk (which > the OS caches, so it might be REALLY fast and use OS memory) in order to > handle second and later phases. This is faster than building an uninverted > field, and consumes no heap memory. > > As I mentioned, the uninverted data is built from indexed terms. The > contents of docValue data is the same as a stored field -- the original > indexed data. Because docValues cannot be added to fields using > solr.TextField, the only type that undergoes text analysis, there's no > possibility of a difference between an uninverted field and docValues. > > Thanks, > Shawn >
Re: JVM GC Issue
On 12/2/2017 6:59 PM, S G wrote: I am a bit curious on the docValues implementation. I understand that docValues do not use JVM memory and they make use of OS cache - that is why they are more performant. But to return any response from the docValues, the values in the docValues' column-oriented-structures would need to be brought into the JVM's memory. And that will then increase the pressure on the JVM's memory anyways. So how do docValues actually help from memory perspective? What I'm writing below is my understanding of docValues. If it turns out that I've gotten any of it wrong, that is MY error, not Solr's. When there are no docValues, Solr must do something called "uninverting the index" in order to satisfy certain operations -- primarily faceting, grouping, and sorting. A Lucene index is an inverted index. This means that it is a big list of terms, and then each of those entries is a second list that describes which fields and documents have the term, as well as some other information like positions. Uninverting the index is pretty efficient, but it does take time. The uninverted index structure is a list of all terms for a specific field. Then there's a second phase -- the info in the uninverted field is read and processed for the query operation, which will use heap. I do not know if there are additional phases. There might be. In case you don't know, in the Lucene index, docValues data on disk consists of every entry in the index for one field, written sequentially in an uncompressed format. This means that for those query types, docValues is *exactly* what Solr needs for the first phase. And instead of generating it into heap memory and then reading it, Solr can just read the data right off the disk (which the OS caches, so it might be REALLY fast and use OS memory) in order to handle second and later phases. This is faster than building an uninverted field, and consumes no heap memory. As I mentioned, the uninverted data is built from indexed terms. The contents of docValue data is the same as a stored field -- the original indexed data. Because docValues cannot be added to fields using solr.TextField, the only type that undergoes text analysis, there's no possibility of a difference between an uninverted field and docValues. Thanks, Shawn
Re: JVM GC Issue
On Sat, Dec 2, 2017 at 8:59 PM, S Gwrote: > I am a bit curious on the docValues implementation. > I understand that docValues do not use JVM memory and > they make use of OS cache - that is why they are more performant. > > But to return any response from the docValues, the values in the > docValues' column-oriented-structures would need to be brought > into the JVM's memory. And that will then increase the pressure > on the JVM's memory anyways. So how do docValues actually > help from memory perspective? docValues are not more performant than the on-heap fieldCache once a fieldCache entry has been populated. docValues do help with memory in a number of ways: 1) off-heap memory use degrades much more gracefully... heap memory will just cause an OOM exception when heap size is exceeded 2) off-heap memory (memory used by OS cache) can be dynamically shared with other processes running on the system (the OS allocates as needed) 3) easier to configure (as opposed to max-heap size) since the OS will just automatically use all free memory to cache relevant parts of memory mapped disk files 4) off-heap memory does not need to be garbage collected (helps with people that have huge GC pauses) -Yonik
Re: JVM GC Issue
I am a bit curious on the docValues implementation. I understand that docValues do not use JVM memory and they make use of OS cache - that is why they are more performant. But to return any response from the docValues, the values in the docValues' column-oriented-structures would need to be brought into the JVM's memory. And that will then increase the pressure on the JVM's memory anyways. So how do docValues actually help from memory perspective? Thanks SG On Sat, Dec 2, 2017 at 12:39 AM, Dominique Bejeanwrote: > Hi, Thank you for the explanations about faceting. I was thinking the hit > count had a biggest impact on facet memory lifecycle. Regardless the hit > cout there is a query peak at the time the issue occurs. This is relative > in regard of what Solr is supposed be able to handle, but this should be > sufficient to explain GC activity growing up. 198 10:07 208 10:08 267 10:09 > 285 10:10 244 10:11 286 10:12 277 10:13 252 10:14 183 10:15 302 10:16 299 > 10:17 273 10:18 348 10:19 468 10:20 496 10:21 673 10:22 496 10:23 101 10:24 > At the time the issue occurs, we see the CPU activity growing up to very > high. May be there is a lack of CPU. So, I will suggest all actions that > will remove pressure on heap memory. > > >- enable docValues >- divide cache size per 2 in order go back to Solr default >- refine the fl parameter as I know it can optimized > > Concerning phonetic filter, anyway it will be removed as a large number of > results are really irrelevant. Regads. Dominique > > > Le sam. 2 déc. 2017 à 04:25, Erick Erickson a > écrit : > > > Doninique: > > > > Actually, the memory requirements shouldn't really go up as the number > > of hits increases. The general algorithm is (say rows=10) > > Calcluate the score of each doc > > If the score is zero, ignore > > If the score is > the minimum in my current top 10, replace the lowest > > scoring doc in my current top 10 with the new doc (a PriorityQueue > > last I knew). > > else discard the doc. > > > > When all docs have been scored, assemble the return from the top 10 > > (or whatever rows is set to). > > > > The key here is that most of the Solr index is kept in > > MMapDirecotry/OS space, see Uwe's excellent blog here: > > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html. > > In terms of _searching_, very little of the Lucene index structures > > are kept in memory. > > > > That said, faceting plays a bit loose with the rules. If you have > > docValues set to true, most of the memory structures are in the OS > > memory space, not the JVM. If you have docValues set to false, then > > the "uninverted" structure is built in the JVM heap space. > > > > Additionally, the JVM requirements are sensitive to the number of > > unique values in field being faceted on. For instance, let's say you > > faceted by a date field with just facet.field=some_date_field. A > > bucket would have to be allocated to hold the counts for each and > > every unique date field, i.e. one for each millisecond in your search, > > which might be something you're seeing. Conceptually this is just an > > array[uniqueValues] of ints (longs? I'm not sure). This should be > > relatively easily testable by omitting the facets while measuring. > > > > Where the number of rows _does_ make a difference is in the return > > packet. Say I have rows=10. In that case I create a single return > > packet with all 10 docs "fl" field. If rows = 10,000 then that return > > packet is obviously 1,000 times as large and must be assembled in > > memory. > > > > I rather doubt the phonetic filter is to blame. But you can test this > > by just omitting the field containing the phonetic filter in the > > search query. I've certainly been wrong before. > > > > Best, > > Erick > > > > On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean > > wrote: > > > Hi, > > > > > > > > > Thank you both for your responses. > > > > > > > > > I just have solr log for the very last period of the CG log. > > > > > > > > > Grep command allows me to count queries per minute with hits > 1000 or > > > > > 1 and so with the biggest impact on memory and cpu during faceting > > > > > > > > >> 1000 > > > > > > 59 11:13 > > > > > > 45 11:14 > > > > > > 36 11:15 > > > > > > 45 11:16 > > > > > > 59 11:17 > > > > > > 40 11:18 > > > > > > 95 11:19 > > > > > > 123 11:20 > > > > > > 137 11:21 > > > > > > 123 11:22 > > > > > > 86 11:23 > > > > > > 26 11:24 > > > > > > 19 11:25 > > > > > > 17 11:26 > > > > > > > > >> 1 > > > > > > 55 11:19 > > > > > > 78 11:20 > > > > > > 48 11:21 > > > > > > 134 11:22 > > > > > > 93 11:23 > > > > > > 10 11:24 > > > > > > > > > So we see that at the time GC start become nuts, large result set count > > > increase. > > > > > > > > > The query field include phonetic filter and
Re: JVM GC Issue
Hi Toke, Nearly 30% of the requests are setting facet.limit=200 On 42000 requests the number of time each field is used for faceting is $ grep 'facet=true' select.log | grep -oP 'facet.field=([^&])*' | sort | uniq -c | sort -r 23119 facet.field=category_path 8643 facet.field=EUR_0_price_decimal 5560 facet.field=type_pratique_facet 5560 facet.field=size_facet_facet 5560 facet.field=marque_facet 5560 facet.field=is_marketplace_origin_facet 5560 facet.field=gender_facet 5560 facet.field=color_facet_facet 5560 facet.field=club_facet 5560 facet.field=age_facet 3290 facet.field=durete_facet 3290 facet.field=diametre_roues_facet 169 facet.field=EUR_1_price_decimal 38 facet.field=EUR_4_price_decimal The larger count of unique values for these fields are category_path 3025 marque_facet 2100 size_facet_facet 1400 type_pratique_facet 166 color_facet_facet 165 Here are 2 typical queries : 2017-11-20 10:13:27.585 INFO (qtp592179046-15153) [ x:french] o.a.s.c.S.Request [french] webapp=/solr path=/select params={mm=100%25=category_path=age_facet=is_marketplace_origin_facet=type_pratique_facet=gender_facet=color_facet_facet=size_facet_facet=EUR_0_price_decimal=EUR_0_price_decimal=marque_facet=club_facet=diametre_roues_facet=durete_facet=*:*& json.nl=map=products_id,product_type_static,name_varchar,store_id,website_id,EUR_0_price_decimal=0=(store_id:"1")+AND+(website_id:"1")+AND+(product_status:"1")+AND+(category_id:"3389")+AND+(filter_visibility_int:"2"+OR+filter_visibility_int:"4")=48=is_marketplace_origin_boost_exact:"OUI"^210+is_marketplace_origin_boost:OUI^207+is_marketplace_origin_relative_boost:OUI^203+is_marketplace_origin_boost_exact:"NON"^210+is_marketplace_origin_boost:NON^207+is_marketplace_origin_relative_boost:NON^203+=*:*=200==edismax=textSearch=true=1=true=json=EUR_0_price_decimal=sort_EUR_0_special_price_decimal=true=1511172807} hits=953 status=0 QTime=26 2017-11-20 10:17:28.193 INFO (qtp592179046-17115) [ x:french] o.a.s.c.S.Request [french] webapp=/solr path=/select params={mm=100%25=category_path=age_facet=is_marketplace_origin_facet=type_pratique_facet=gender_facet=color_facet_facet=size_facet_facet=EUR_0_price_decimal=marque_facet=club_facet& json.nl=map=products_id,product_type_static,name_varchar,store_id,website_id,EUR_0_price_decimal=0=(store_id:"1")+AND+(website_id:"1")+AND+(product_status:"1")+AND+(filter_visibility_int:"3"+OR+filter_visibility_int:"4")8=name_boost_exact:"velo"^120+name_boost:"velo"^100+name_relative_boost:velo^80+category_boost:velo^60+is_marketplace_origin_boost_exact:"OUI"^210+is_marketplace_origin_boost:OUI^207+is_marketplace_origin_relative_boost:OUI^203+is_marketplace_origin_boost_exact:"NON"^210+is_marketplace_origin_boost:NON^207+is_marketplace_origin_relative_boost:NON^203+size_facet_boost_exact:"velo"^299+size_facet_boost:velo^296+size_facet_relative_boost:velo^292+marque_boost_exact:"velo"^359+marque_boost:velo^356+marque_relative_boost:velo^352+=velo=200=velo=edismax=textSearch=true=1=true=json=EUR_0_price_decimal=sort_EUR_0_special_price_decimal=true=1511173047} hits=6761 status=0 QTime=38 Dominique Le sam. 2 déc. 2017 à 16:23, Toke Eskildsena écrit : > Dominique Bejean wrote: > > Hi, Thank you for the explanations about faceting. I was thinking the hit > > count had a biggest impact on facet memory lifecycle. > > Only if you have a very high facet.limit. Could you provide us with a > typical query, including all the parameters? > > - Toke Eskildsen > -- Dominique Béjean 06 08 46 12 43
Re: JVM GC Issue
Dominique Bejeanwrote: > Hi, Thank you for the explanations about faceting. I was thinking the hit > count had a biggest impact on facet memory lifecycle. Only if you have a very high facet.limit. Could you provide us with a typical query, including all the parameters? - Toke Eskildsen
Re: JVM GC Issue
Hi, Thank you for the explanations about faceting. I was thinking the hit count had a biggest impact on facet memory lifecycle. Regardless the hit cout there is a query peak at the time the issue occurs. This is relative in regard of what Solr is supposed be able to handle, but this should be sufficient to explain GC activity growing up. 198 10:07 208 10:08 267 10:09 285 10:10 244 10:11 286 10:12 277 10:13 252 10:14 183 10:15 302 10:16 299 10:17 273 10:18 348 10:19 468 10:20 496 10:21 673 10:22 496 10:23 101 10:24 At the time the issue occurs, we see the CPU activity growing up to very high. May be there is a lack of CPU. So, I will suggest all actions that will remove pressure on heap memory. - enable docValues - divide cache size per 2 in order go back to Solr default - refine the fl parameter as I know it can optimized Concerning phonetic filter, anyway it will be removed as a large number of results are really irrelevant. Regads. Dominique Le sam. 2 déc. 2017 à 04:25, Erick Ericksona écrit : > Doninique: > > Actually, the memory requirements shouldn't really go up as the number > of hits increases. The general algorithm is (say rows=10) > Calcluate the score of each doc > If the score is zero, ignore > If the score is > the minimum in my current top 10, replace the lowest > scoring doc in my current top 10 with the new doc (a PriorityQueue > last I knew). > else discard the doc. > > When all docs have been scored, assemble the return from the top 10 > (or whatever rows is set to). > > The key here is that most of the Solr index is kept in > MMapDirecotry/OS space, see Uwe's excellent blog here: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html. > In terms of _searching_, very little of the Lucene index structures > are kept in memory. > > That said, faceting plays a bit loose with the rules. If you have > docValues set to true, most of the memory structures are in the OS > memory space, not the JVM. If you have docValues set to false, then > the "uninverted" structure is built in the JVM heap space. > > Additionally, the JVM requirements are sensitive to the number of > unique values in field being faceted on. For instance, let's say you > faceted by a date field with just facet.field=some_date_field. A > bucket would have to be allocated to hold the counts for each and > every unique date field, i.e. one for each millisecond in your search, > which might be something you're seeing. Conceptually this is just an > array[uniqueValues] of ints (longs? I'm not sure). This should be > relatively easily testable by omitting the facets while measuring. > > Where the number of rows _does_ make a difference is in the return > packet. Say I have rows=10. In that case I create a single return > packet with all 10 docs "fl" field. If rows = 10,000 then that return > packet is obviously 1,000 times as large and must be assembled in > memory. > > I rather doubt the phonetic filter is to blame. But you can test this > by just omitting the field containing the phonetic filter in the > search query. I've certainly been wrong before. > > Best, > Erick > > On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean > wrote: > > Hi, > > > > > > Thank you both for your responses. > > > > > > I just have solr log for the very last period of the CG log. > > > > > > Grep command allows me to count queries per minute with hits > 1000 or > > > 1 and so with the biggest impact on memory and cpu during faceting > > > > > >> 1000 > > > > 59 11:13 > > > > 45 11:14 > > > > 36 11:15 > > > > 45 11:16 > > > > 59 11:17 > > > > 40 11:18 > > > > 95 11:19 > > > > 123 11:20 > > > > 137 11:21 > > > > 123 11:22 > > > > 86 11:23 > > > > 26 11:24 > > > > 19 11:25 > > > > 17 11:26 > > > > > >> 1 > > > > 55 11:19 > > > > 78 11:20 > > > > 48 11:21 > > > > 134 11:22 > > > > 93 11:23 > > > > 10 11:24 > > > > > > So we see that at the time GC start become nuts, large result set count > > increase. > > > > > > The query field include phonetic filter and results are really not > relevant > > due to this. I will suggest to : > > > > > > 1/ remove the phonetic filter in order to have less irrelevant results > and > > so get smaller result set > > > > 2/ enable docValues on field use for faceting > > > > > > I expect decrease GC requirements and stabilize GC. > > > > > > Regards > > > > > > Dominique > > > > > > > > > > > > Le ven. 1 déc. 2017 à 18:17, Erick Erickson a > > écrit : > > > >> Your autowarm counts are rather high, bit as Toke says this doesn't > >> seem outrageous. > >> > >> I have seen situations where Solr is running close to the limits of > >> its heap and GC only reclaims a tiny bit of memory each time, when you > >> say "full GC with no memory > >> reclaimed" is that really no memory _at all_? Or "almost no memory"? > >>
Re: JVM GC Issue
Doninique: Actually, the memory requirements shouldn't really go up as the number of hits increases. The general algorithm is (say rows=10) Calcluate the score of each doc If the score is zero, ignore If the score is > the minimum in my current top 10, replace the lowest scoring doc in my current top 10 with the new doc (a PriorityQueue last I knew). else discard the doc. When all docs have been scored, assemble the return from the top 10 (or whatever rows is set to). The key here is that most of the Solr index is kept in MMapDirecotry/OS space, see Uwe's excellent blog here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html. In terms of _searching_, very little of the Lucene index structures are kept in memory. That said, faceting plays a bit loose with the rules. If you have docValues set to true, most of the memory structures are in the OS memory space, not the JVM. If you have docValues set to false, then the "uninverted" structure is built in the JVM heap space. Additionally, the JVM requirements are sensitive to the number of unique values in field being faceted on. For instance, let's say you faceted by a date field with just facet.field=some_date_field. A bucket would have to be allocated to hold the counts for each and every unique date field, i.e. one for each millisecond in your search, which might be something you're seeing. Conceptually this is just an array[uniqueValues] of ints (longs? I'm not sure). This should be relatively easily testable by omitting the facets while measuring. Where the number of rows _does_ make a difference is in the return packet. Say I have rows=10. In that case I create a single return packet with all 10 docs "fl" field. If rows = 10,000 then that return packet is obviously 1,000 times as large and must be assembled in memory. I rather doubt the phonetic filter is to blame. But you can test this by just omitting the field containing the phonetic filter in the search query. I've certainly been wrong before. Best, Erick On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejeanwrote: > Hi, > > > Thank you both for your responses. > > > I just have solr log for the very last period of the CG log. > > > Grep command allows me to count queries per minute with hits > 1000 or > > 1 and so with the biggest impact on memory and cpu during faceting > > >> 1000 > > 59 11:13 > > 45 11:14 > > 36 11:15 > > 45 11:16 > > 59 11:17 > > 40 11:18 > > 95 11:19 > > 123 11:20 > > 137 11:21 > > 123 11:22 > > 86 11:23 > > 26 11:24 > > 19 11:25 > > 17 11:26 > > >> 1 > > 55 11:19 > > 78 11:20 > > 48 11:21 > > 134 11:22 > > 93 11:23 > > 10 11:24 > > > So we see that at the time GC start become nuts, large result set count > increase. > > > The query field include phonetic filter and results are really not relevant > due to this. I will suggest to : > > > 1/ remove the phonetic filter in order to have less irrelevant results and > so get smaller result set > > 2/ enable docValues on field use for faceting > > > I expect decrease GC requirements and stabilize GC. > > > Regards > > > Dominique > > > > > > Le ven. 1 déc. 2017 à 18:17, Erick Erickson a > écrit : > >> Your autowarm counts are rather high, bit as Toke says this doesn't >> seem outrageous. >> >> I have seen situations where Solr is running close to the limits of >> its heap and GC only reclaims a tiny bit of memory each time, when you >> say "full GC with no memory >> reclaimed" is that really no memory _at all_? Or "almost no memory"? >> This situation can be alleviated by allocating more memory to the JVM >> . >> >> Your JVM pressure would certainly be reduced by enabling docValues on >> any field you sort,facet or group on. That would require a full >> reindex of course. Note that this makes your index on disk bigger, but >> reduces JVM pressure by roughly the same amount so it's a win in this >> situation. >> >> Have you attached a memory profiler to the running Solr instance? I'd >> be curious where the memory is being allocated. >> >> Best, >> Erick >> >> On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen wrote: >> > Dominique Bejean wrote: >> >> We are encountering issue with GC. >> > >> >> Randomly nearly once a day there are consecutive full GC with no memory >> >> reclaimed. >> > >> > [... 1.2M docs, Xmx 6GB ...] >> > >> >> Gceasy suggest to increase heap size, but I do not agree >> > >> > It does seem strange, with your apparently modest index & workload. >> Nothing you say sounds problematic to me and you have covered the usual >> culprits overlapping searchers, faceting and filterCache. >> > >> > Is it possible for you to share the solr.log around the two times that >> memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00. >> > >> > If you cannot share, please check if you have excessive
Re: JVM GC Issue
Hi, Thank you both for your responses. I just have solr log for the very last period of the CG log. Grep command allows me to count queries per minute with hits > 1000 or > 1 and so with the biggest impact on memory and cpu during faceting > 1000 59 11:13 45 11:14 36 11:15 45 11:16 59 11:17 40 11:18 95 11:19 123 11:20 137 11:21 123 11:22 86 11:23 26 11:24 19 11:25 17 11:26 > 1 55 11:19 78 11:20 48 11:21 134 11:22 93 11:23 10 11:24 So we see that at the time GC start become nuts, large result set count increase. The query field include phonetic filter and results are really not relevant due to this. I will suggest to : 1/ remove the phonetic filter in order to have less irrelevant results and so get smaller result set 2/ enable docValues on field use for faceting I expect decrease GC requirements and stabilize GC. Regards Dominique Le ven. 1 déc. 2017 à 18:17, Erick Ericksona écrit : > Your autowarm counts are rather high, bit as Toke says this doesn't > seem outrageous. > > I have seen situations where Solr is running close to the limits of > its heap and GC only reclaims a tiny bit of memory each time, when you > say "full GC with no memory > reclaimed" is that really no memory _at all_? Or "almost no memory"? > This situation can be alleviated by allocating more memory to the JVM > . > > Your JVM pressure would certainly be reduced by enabling docValues on > any field you sort,facet or group on. That would require a full > reindex of course. Note that this makes your index on disk bigger, but > reduces JVM pressure by roughly the same amount so it's a win in this > situation. > > Have you attached a memory profiler to the running Solr instance? I'd > be curious where the memory is being allocated. > > Best, > Erick > > On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen wrote: > > Dominique Bejean wrote: > >> We are encountering issue with GC. > > > >> Randomly nearly once a day there are consecutive full GC with no memory > >> reclaimed. > > > > [... 1.2M docs, Xmx 6GB ...] > > > >> Gceasy suggest to increase heap size, but I do not agree > > > > It does seem strange, with your apparently modest index & workload. > Nothing you say sounds problematic to me and you have covered the usual > culprits overlapping searchers, faceting and filterCache. > > > > Is it possible for you to share the solr.log around the two times that > memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00. > > > > If you cannot share, please check if you have excessive traffic around > that time or if there is a lot of UnInverting going on (triggered by > faceting on non.DocValues String fields). I know your post implies that you > have already done so, so this is more of a sanity check. > > > > > > - Toke Eskildsen > -- Dominique Béjean 06 08 46 12 43
Re: JVM GC Issue
Your autowarm counts are rather high, bit as Toke says this doesn't seem outrageous. I have seen situations where Solr is running close to the limits of its heap and GC only reclaims a tiny bit of memory each time, when you say "full GC with no memory reclaimed" is that really no memory _at all_? Or "almost no memory"? This situation can be alleviated by allocating more memory to the JVM . Your JVM pressure would certainly be reduced by enabling docValues on any field you sort,facet or group on. That would require a full reindex of course. Note that this makes your index on disk bigger, but reduces JVM pressure by roughly the same amount so it's a win in this situation. Have you attached a memory profiler to the running Solr instance? I'd be curious where the memory is being allocated. Best, Erick On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsenwrote: > Dominique Bejean wrote: >> We are encountering issue with GC. > >> Randomly nearly once a day there are consecutive full GC with no memory >> reclaimed. > > [... 1.2M docs, Xmx 6GB ...] > >> Gceasy suggest to increase heap size, but I do not agree > > It does seem strange, with your apparently modest index & workload. Nothing > you say sounds problematic to me and you have covered the usual culprits > overlapping searchers, faceting and filterCache. > > Is it possible for you to share the solr.log around the two times that memory > usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00. > > If you cannot share, please check if you have excessive traffic around that > time or if there is a lot of UnInverting going on (triggered by faceting on > non.DocValues String fields). I know your post implies that you have already > done so, so this is more of a sanity check. > > > - Toke Eskildsen
Re: JVM GC Issue
Dominique Bejeanwrote: > We are encountering issue with GC. > Randomly nearly once a day there are consecutive full GC with no memory > reclaimed. [... 1.2M docs, Xmx 6GB ...] > Gceasy suggest to increase heap size, but I do not agree It does seem strange, with your apparently modest index & workload. Nothing you say sounds problematic to me and you have covered the usual culprits overlapping searchers, faceting and filterCache. Is it possible for you to share the solr.log around the two times that memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00. If you cannot share, please check if you have excessive traffic around that time or if there is a lot of UnInverting going on (triggered by faceting on non.DocValues String fields). I know your post implies that you have already done so, so this is more of a sanity check. - Toke Eskildsen