Re: OOMs in Solr
bq: ...so I wonder if reducing the heap is going to help or it won’t matter that much... Well, if you're hitting OOM errors than you have no _choice_ but to reduce the heap. Or increase the memory. And you don't have much physical memory to grow into. Longer term, reducing the JVM size (assuming you can w/o hitting OOM errors) is always to the good. The more heap, the more GC you have, the longer stop-the-world GC pauses will take etc. The OS memory management for GC is vastly more efficient (because it's simpler) than Java's is. Note, however, that this "more art than science". I've seen situations where the JVM requires very close to the max heap size at some point. >From there I've seen situations where the GC kicks in and recovers just enough memory to continue for a few milliseconds and then go right back into a GC cycle. So you need some overhead. Or are you talking about SSDs for the OS to use for swapping? Assuming you're swapping we're talking about query response time here, SSDs will be much faster if you're swapping. But you _really_ want to strive to _not_ swap. SSD access is faster than spinning disk for sure, but still vastly slower than RAM access. I applaud you changing one thing at a time BTW. You probably want to use GCViewer or similar on the GC logs (turn them on first!) for Solr for a quick take on how GC is performing when you test. And the one other thing I'd do: Mine your Solr (or servelet container) logs for the real queries over one of these periods. Then use something like jmeter (or roll your own) test program to fire them at your test instance to evaluate the effects of your changes. Best, Erick On Mon, Dec 12, 2016 at 1:03 PM, Alfonso Muñoz-Pomer Fuenteswrote: > According to the post you linked to, it strongly advises to buy SSDs. I got > in touch with the systems department in my organization and it turns out > that our VM storage is SSD-backed, so I wonder if reducing the heap is going > to help or it won’t matter that much. Of course, there’s nothing like trying > and check out the results. I’ll do that in due time, though. At the moment > I’ve reduced the filter cache and will change all parameters one at a time > to see what affects performance the most. > > Thanks again for the feedback. > > On 12/12/2016 19:36, Erick Erickson wrote: >> >> The biggest bang for the buck is _probably_ docValues for the fields >> you facet on. If that's the culprit, you can also reduce your JVM heap >> considerably, as Toke says, leaving this little memory for the OS is >> bad. Here's the writeup on why: >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html >> >> Roughly what's happening is that all the values you facet on have to >> be read into memory somewhere. docvalues puts almost all of that into >> the OS memory rather than JVM heap. It's much faster to load, reduces >> JVM GC pressure, OOMs, and allows the pages to be swapped out. >> >> However, this is somewhat pushing the problem around. Moving the >> memory consumption to the OS memory space will have a huge impact on >> your OOM errors but the cost will be that you'll probably start >> swapping pages out of the OS memory, which will impact search speed. >> Slower searches are preferable to OOMs, certainly. That said you'll >> probably need more physical memory at some point, or go to SolrCloud >> or >> >> Best, >> Erick >> >> On Mon, Dec 12, 2016 at 10:57 AM, Susheel Kumar >> wrote: >>> >>> Double check if your queries are not running into deep pagination >>> (q=*:*...=). This is something i recently >>> experienced >>> and was the only cause of OOM. You may have the gc logs when OOM >>> happened >>> and drawing it on GC Viewer may give insight how gradual your heap got >>> filled and run into OOM. >>> >>> Thanks, >>> Susheel >>> >>> On Mon, Dec 12, 2016 at 10:32 AM, Alfonso Muñoz-Pomer Fuentes < >>> amu...@ebi.ac.uk> wrote: >>> Thanks again. I’m learning more about Solr in this thread than in my previous months reading about it! Moving to Solr Cloud is a possibility we’ve discussed and I guess it will eventually happen, as the index will grow no matter what. I’ve already lowered filterCache from 512 to 64 and I’m looking forward to seeing what happens in the next few days. Our filter cache hit ratio was 0.99, so I would expect this to go down but if we can have a more efficiente memory usage I think e.g. an extra second for each search is still acceptable. Regarding the startup scripts we’re using the ones included with Solr. As for the use of filters we’re always using the same four filters, IIRC. In any case we’ll review the code to ensure that that’s the case. I’m aware of the need to reindex when the schema changes, but thanks for the reminder. We’ll add docValues because I think that’ll make a significant difference in
Re: OOMs in Solr
According to the post you linked to, it strongly advises to buy SSDs. I got in touch with the systems department in my organization and it turns out that our VM storage is SSD-backed, so I wonder if reducing the heap is going to help or it won’t matter that much. Of course, there’s nothing like trying and check out the results. I’ll do that in due time, though. At the moment I’ve reduced the filter cache and will change all parameters one at a time to see what affects performance the most. Thanks again for the feedback. On 12/12/2016 19:36, Erick Erickson wrote: The biggest bang for the buck is _probably_ docValues for the fields you facet on. If that's the culprit, you can also reduce your JVM heap considerably, as Toke says, leaving this little memory for the OS is bad. Here's the writeup on why: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Roughly what's happening is that all the values you facet on have to be read into memory somewhere. docvalues puts almost all of that into the OS memory rather than JVM heap. It's much faster to load, reduces JVM GC pressure, OOMs, and allows the pages to be swapped out. However, this is somewhat pushing the problem around. Moving the memory consumption to the OS memory space will have a huge impact on your OOM errors but the cost will be that you'll probably start swapping pages out of the OS memory, which will impact search speed. Slower searches are preferable to OOMs, certainly. That said you'll probably need more physical memory at some point, or go to SolrCloud or Best, Erick On Mon, Dec 12, 2016 at 10:57 AM, Susheel Kumarwrote: Double check if your queries are not running into deep pagination (q=*:*...=). This is something i recently experienced and was the only cause of OOM. You may have the gc logs when OOM happened and drawing it on GC Viewer may give insight how gradual your heap got filled and run into OOM. Thanks, Susheel On Mon, Dec 12, 2016 at 10:32 AM, Alfonso Muñoz-Pomer Fuentes < amu...@ebi.ac.uk> wrote: Thanks again. I’m learning more about Solr in this thread than in my previous months reading about it! Moving to Solr Cloud is a possibility we’ve discussed and I guess it will eventually happen, as the index will grow no matter what. I’ve already lowered filterCache from 512 to 64 and I’m looking forward to seeing what happens in the next few days. Our filter cache hit ratio was 0.99, so I would expect this to go down but if we can have a more efficiente memory usage I think e.g. an extra second for each search is still acceptable. Regarding the startup scripts we’re using the ones included with Solr. As for the use of filters we’re always using the same four filters, IIRC. In any case we’ll review the code to ensure that that’s the case. I’m aware of the need to reindex when the schema changes, but thanks for the reminder. We’ll add docValues because I think that’ll make a significant difference in our case. We’ll also try to leave space for the disk cache as we’re using spinning disk storage. Thanks again to everybody for the useful and insightful replies. Alfonso On 12/12/2016 14:12, Shawn Heisey wrote: On 12/12/2016 3:13 AM, Alfonso Muñoz-Pomer Fuentes wrote: I’m writing because in our web application we’re using Solr 5.1.0 and currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are dedicated to Solr and nothing else is running there). We have four cores, that are this size: - 25.56 GB, Num Docs = 57,860,845 - 12.09 GB, Num Docs = 173,491,631 (The other two cores are about 10 MB, 20k docs) An OOM indicates that a Java application is requesting more memory than it has been told it can use. There are only two remedies for OOM errors: Increase the heap, or make the program use less memory. In this email, I have concentrated on ways to reduce the memory requirements. These index sizes and document counts are relatively small to Solr -- as long as you have enough memory and are smart about how it's used. Solr 5.1.0 comes with GC tuning built into the startup scripts, using some well-tested CMS settings. If you are using those startup scripts, then the parallel collector will NOT be default. No matter what collector is in use, it cannot fix OOM problems. It may change when and how frequently they occur, but it can't do anything about them. We aren’t indexing on this machine, and we’re getting OOM relatively quickly (after about 14 hours of regular use). Right now we have a Cron job that restarts Solr every 12 hours, so it’s not pretty. We use faceting quite heavily and mostly as a document storage server (we want full data sets instead of the n most relevant results). Like Toke, I suspect two things: a very large filterCache, and the heavy facet usage, maybe both. Enabling docValues on the fields you're using for faceting and reindexing will make the latter more memory efficient, and likely faster. Reducing the filterCache size
Re: OOMs in Solr
The biggest bang for the buck is _probably_ docValues for the fields you facet on. If that's the culprit, you can also reduce your JVM heap considerably, as Toke says, leaving this little memory for the OS is bad. Here's the writeup on why: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Roughly what's happening is that all the values you facet on have to be read into memory somewhere. docvalues puts almost all of that into the OS memory rather than JVM heap. It's much faster to load, reduces JVM GC pressure, OOMs, and allows the pages to be swapped out. However, this is somewhat pushing the problem around. Moving the memory consumption to the OS memory space will have a huge impact on your OOM errors but the cost will be that you'll probably start swapping pages out of the OS memory, which will impact search speed. Slower searches are preferable to OOMs, certainly. That said you'll probably need more physical memory at some point, or go to SolrCloud or Best, Erick On Mon, Dec 12, 2016 at 10:57 AM, Susheel Kumarwrote: > Double check if your queries are not running into deep pagination > (q=*:*...=). This is something i recently experienced > and was the only cause of OOM. You may have the gc logs when OOM happened > and drawing it on GC Viewer may give insight how gradual your heap got > filled and run into OOM. > > Thanks, > Susheel > > On Mon, Dec 12, 2016 at 10:32 AM, Alfonso Muñoz-Pomer Fuentes < > amu...@ebi.ac.uk> wrote: > >> Thanks again. >> >> I’m learning more about Solr in this thread than in my previous months >> reading about it! >> >> Moving to Solr Cloud is a possibility we’ve discussed and I guess it will >> eventually happen, as the index will grow no matter what. >> >> I’ve already lowered filterCache from 512 to 64 and I’m looking forward to >> seeing what happens in the next few days. Our filter cache hit ratio was >> 0.99, so I would expect this to go down but if we can have a more >> efficiente memory usage I think e.g. an extra second for each search is >> still acceptable. >> >> Regarding the startup scripts we’re using the ones included with Solr. >> >> As for the use of filters we’re always using the same four filters, IIRC. >> In any case we’ll review the code to ensure that that’s the case. >> >> I’m aware of the need to reindex when the schema changes, but thanks for >> the reminder. We’ll add docValues because I think that’ll make a >> significant difference in our case. We’ll also try to leave space for the >> disk cache as we’re using spinning disk storage. >> >> Thanks again to everybody for the useful and insightful replies. >> >> Alfonso >> >> >> On 12/12/2016 14:12, Shawn Heisey wrote: >> >>> On 12/12/2016 3:13 AM, Alfonso Muñoz-Pomer Fuentes wrote: >>> I’m writing because in our web application we’re using Solr 5.1.0 and currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are dedicated to Solr and nothing else is running there). We have four cores, that are this size: - 25.56 GB, Num Docs = 57,860,845 - 12.09 GB, Num Docs = 173,491,631 (The other two cores are about 10 MB, 20k docs) >>> >>> An OOM indicates that a Java application is requesting more memory than >>> it has been told it can use. There are only two remedies for OOM errors: >>> Increase the heap, or make the program use less memory. In this email, >>> I have concentrated on ways to reduce the memory requirements. >>> >>> These index sizes and document counts are relatively small to Solr -- as >>> long as you have enough memory and are smart about how it's used. >>> >>> Solr 5.1.0 comes with GC tuning built into the startup scripts, using >>> some well-tested CMS settings. If you are using those startup scripts, >>> then the parallel collector will NOT be default. No matter what >>> collector is in use, it cannot fix OOM problems. It may change when and >>> how frequently they occur, but it can't do anything about them. >>> >>> We aren’t indexing on this machine, and we’re getting OOM relatively quickly (after about 14 hours of regular use). Right now we have a Cron job that restarts Solr every 12 hours, so it’s not pretty. We use faceting quite heavily and mostly as a document storage server (we want full data sets instead of the n most relevant results). >>> >>> Like Toke, I suspect two things: a very large filterCache, and the heavy >>> facet usage, maybe both. Enabling docValues on the fields you're using >>> for faceting and reindexing will make the latter more memory efficient, >>> and likely faster. Reducing the filterCache size would help the >>> former. Note that if you have a completely static index, then it is >>> more likely that you will fill up the filterCache over time. >>> >>> I don’t know if what we’re experiencing is usual given the index size and memory constraint of the VM, or something looks like it’s wildly misconfigured. What do
Re: OOMs in Solr
Double check if your queries are not running into deep pagination (q=*:*...=). This is something i recently experienced and was the only cause of OOM. You may have the gc logs when OOM happened and drawing it on GC Viewer may give insight how gradual your heap got filled and run into OOM. Thanks, Susheel On Mon, Dec 12, 2016 at 10:32 AM, Alfonso Muñoz-Pomer Fuentes < amu...@ebi.ac.uk> wrote: > Thanks again. > > I’m learning more about Solr in this thread than in my previous months > reading about it! > > Moving to Solr Cloud is a possibility we’ve discussed and I guess it will > eventually happen, as the index will grow no matter what. > > I’ve already lowered filterCache from 512 to 64 and I’m looking forward to > seeing what happens in the next few days. Our filter cache hit ratio was > 0.99, so I would expect this to go down but if we can have a more > efficiente memory usage I think e.g. an extra second for each search is > still acceptable. > > Regarding the startup scripts we’re using the ones included with Solr. > > As for the use of filters we’re always using the same four filters, IIRC. > In any case we’ll review the code to ensure that that’s the case. > > I’m aware of the need to reindex when the schema changes, but thanks for > the reminder. We’ll add docValues because I think that’ll make a > significant difference in our case. We’ll also try to leave space for the > disk cache as we’re using spinning disk storage. > > Thanks again to everybody for the useful and insightful replies. > > Alfonso > > > On 12/12/2016 14:12, Shawn Heisey wrote: > >> On 12/12/2016 3:13 AM, Alfonso Muñoz-Pomer Fuentes wrote: >> >>> I’m writing because in our web application we’re using Solr 5.1.0 and >>> currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are >>> dedicated to Solr and nothing else is running there). We have four >>> cores, that are this size: >>> - 25.56 GB, Num Docs = 57,860,845 >>> - 12.09 GB, Num Docs = 173,491,631 >>> >>> (The other two cores are about 10 MB, 20k docs) >>> >> >> An OOM indicates that a Java application is requesting more memory than >> it has been told it can use. There are only two remedies for OOM errors: >> Increase the heap, or make the program use less memory. In this email, >> I have concentrated on ways to reduce the memory requirements. >> >> These index sizes and document counts are relatively small to Solr -- as >> long as you have enough memory and are smart about how it's used. >> >> Solr 5.1.0 comes with GC tuning built into the startup scripts, using >> some well-tested CMS settings. If you are using those startup scripts, >> then the parallel collector will NOT be default. No matter what >> collector is in use, it cannot fix OOM problems. It may change when and >> how frequently they occur, but it can't do anything about them. >> >> We aren’t indexing on this machine, and we’re getting OOM relatively >>> quickly (after about 14 hours of regular use). Right now we have a >>> Cron job that restarts Solr every 12 hours, so it’s not pretty. We use >>> faceting quite heavily and mostly as a document storage server (we >>> want full data sets instead of the n most relevant results). >>> >> >> Like Toke, I suspect two things: a very large filterCache, and the heavy >> facet usage, maybe both. Enabling docValues on the fields you're using >> for faceting and reindexing will make the latter more memory efficient, >> and likely faster. Reducing the filterCache size would help the >> former. Note that if you have a completely static index, then it is >> more likely that you will fill up the filterCache over time. >> >> I don’t know if what we’re experiencing is usual given the index size >>> and memory constraint of the VM, or something looks like it’s wildly >>> misconfigured. What do you think? Any useful pointers for some tuning >>> we could do to improve the service? Would upgrading to Solr 6 make sense? >>> >> >> As I already mentioned, the first thing I'd check is the size of the >> filterCache. Reduce it, possibly so it's VERY small. Do everything you >> can to assure that you are re-using filters, not sending many unique >> filters. One of the most common things that leads to low filter re-use >> is using the bare NOW keyword in date filters and queries. Use NOW/HOUR >> or NOW/DAY instead -- NOW changes once a millisecond, so it is typically >> unique for every query. FilterCache entries are huge, as you were told >> in another reply. >> >> Unless you use docValues, or utilize the facet.method parameter VERY >> carefully, each field you facet on will tie up a large section of memory >> containing the value for that field in EVERY document in the index. >> With the document counts you've got, this is a LOT of memory. >> >> It is strongly recommended to have docValues enabled on every field >> you're using for faceting. If you change the schema in this manner, a >> full reindex will be required before you can use that field again. >> >>
Re: OOMs in Solr
Thanks again. I’m learning more about Solr in this thread than in my previous months reading about it! Moving to Solr Cloud is a possibility we’ve discussed and I guess it will eventually happen, as the index will grow no matter what. I’ve already lowered filterCache from 512 to 64 and I’m looking forward to seeing what happens in the next few days. Our filter cache hit ratio was 0.99, so I would expect this to go down but if we can have a more efficiente memory usage I think e.g. an extra second for each search is still acceptable. Regarding the startup scripts we’re using the ones included with Solr. As for the use of filters we’re always using the same four filters, IIRC. In any case we’ll review the code to ensure that that’s the case. I’m aware of the need to reindex when the schema changes, but thanks for the reminder. We’ll add docValues because I think that’ll make a significant difference in our case. We’ll also try to leave space for the disk cache as we’re using spinning disk storage. Thanks again to everybody for the useful and insightful replies. Alfonso On 12/12/2016 14:12, Shawn Heisey wrote: On 12/12/2016 3:13 AM, Alfonso Muñoz-Pomer Fuentes wrote: I’m writing because in our web application we’re using Solr 5.1.0 and currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are dedicated to Solr and nothing else is running there). We have four cores, that are this size: - 25.56 GB, Num Docs = 57,860,845 - 12.09 GB, Num Docs = 173,491,631 (The other two cores are about 10 MB, 20k docs) An OOM indicates that a Java application is requesting more memory than it has been told it can use. There are only two remedies for OOM errors: Increase the heap, or make the program use less memory. In this email, I have concentrated on ways to reduce the memory requirements. These index sizes and document counts are relatively small to Solr -- as long as you have enough memory and are smart about how it's used. Solr 5.1.0 comes with GC tuning built into the startup scripts, using some well-tested CMS settings. If you are using those startup scripts, then the parallel collector will NOT be default. No matter what collector is in use, it cannot fix OOM problems. It may change when and how frequently they occur, but it can't do anything about them. We aren’t indexing on this machine, and we’re getting OOM relatively quickly (after about 14 hours of regular use). Right now we have a Cron job that restarts Solr every 12 hours, so it’s not pretty. We use faceting quite heavily and mostly as a document storage server (we want full data sets instead of the n most relevant results). Like Toke, I suspect two things: a very large filterCache, and the heavy facet usage, maybe both. Enabling docValues on the fields you're using for faceting and reindexing will make the latter more memory efficient, and likely faster. Reducing the filterCache size would help the former. Note that if you have a completely static index, then it is more likely that you will fill up the filterCache over time. I don’t know if what we’re experiencing is usual given the index size and memory constraint of the VM, or something looks like it’s wildly misconfigured. What do you think? Any useful pointers for some tuning we could do to improve the service? Would upgrading to Solr 6 make sense? As I already mentioned, the first thing I'd check is the size of the filterCache. Reduce it, possibly so it's VERY small. Do everything you can to assure that you are re-using filters, not sending many unique filters. One of the most common things that leads to low filter re-use is using the bare NOW keyword in date filters and queries. Use NOW/HOUR or NOW/DAY instead -- NOW changes once a millisecond, so it is typically unique for every query. FilterCache entries are huge, as you were told in another reply. Unless you use docValues, or utilize the facet.method parameter VERY carefully, each field you facet on will tie up a large section of memory containing the value for that field in EVERY document in the index. With the document counts you've got, this is a LOT of memory. It is strongly recommended to have docValues enabled on every field you're using for faceting. If you change the schema in this manner, a full reindex will be required before you can use that field again. There is another problem lurking here that Toke already touched on: Leaving only 2GB of RAM for the OS to handle disk caching will result in terrible performance. What you've been told by me and and in other replies is discussed here: https://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn -- Alfonso Muñoz-Pomer Fuentes Software Engineer @ Expression Atlas Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Tel:+ 44 (0) 1223 49 2633 Skype: amunozpomer
Re: OOMs in Solr
On 12/12/2016 3:13 AM, Alfonso Muñoz-Pomer Fuentes wrote: > I’m writing because in our web application we’re using Solr 5.1.0 and > currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are > dedicated to Solr and nothing else is running there). We have four > cores, that are this size: > - 25.56 GB, Num Docs = 57,860,845 > - 12.09 GB, Num Docs = 173,491,631 > > (The other two cores are about 10 MB, 20k docs) An OOM indicates that a Java application is requesting more memory than it has been told it can use. There are only two remedies for OOM errors: Increase the heap, or make the program use less memory. In this email, I have concentrated on ways to reduce the memory requirements. These index sizes and document counts are relatively small to Solr -- as long as you have enough memory and are smart about how it's used. Solr 5.1.0 comes with GC tuning built into the startup scripts, using some well-tested CMS settings. If you are using those startup scripts, then the parallel collector will NOT be default. No matter what collector is in use, it cannot fix OOM problems. It may change when and how frequently they occur, but it can't do anything about them. > We aren’t indexing on this machine, and we’re getting OOM relatively > quickly (after about 14 hours of regular use). Right now we have a > Cron job that restarts Solr every 12 hours, so it’s not pretty. We use > faceting quite heavily and mostly as a document storage server (we > want full data sets instead of the n most relevant results). Like Toke, I suspect two things: a very large filterCache, and the heavy facet usage, maybe both. Enabling docValues on the fields you're using for faceting and reindexing will make the latter more memory efficient, and likely faster. Reducing the filterCache size would help the former. Note that if you have a completely static index, then it is more likely that you will fill up the filterCache over time. > I don’t know if what we’re experiencing is usual given the index size > and memory constraint of the VM, or something looks like it’s wildly > misconfigured. What do you think? Any useful pointers for some tuning > we could do to improve the service? Would upgrading to Solr 6 make sense? As I already mentioned, the first thing I'd check is the size of the filterCache. Reduce it, possibly so it's VERY small. Do everything you can to assure that you are re-using filters, not sending many unique filters. One of the most common things that leads to low filter re-use is using the bare NOW keyword in date filters and queries. Use NOW/HOUR or NOW/DAY instead -- NOW changes once a millisecond, so it is typically unique for every query. FilterCache entries are huge, as you were told in another reply. Unless you use docValues, or utilize the facet.method parameter VERY carefully, each field you facet on will tie up a large section of memory containing the value for that field in EVERY document in the index. With the document counts you've got, this is a LOT of memory. It is strongly recommended to have docValues enabled on every field you're using for faceting. If you change the schema in this manner, a full reindex will be required before you can use that field again. There is another problem lurking here that Toke already touched on: Leaving only 2GB of RAM for the OS to handle disk caching will result in terrible performance. What you've been told by me and and in other replies is discussed here: https://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
RE: OOMs in Solr
You can also try following: 1. reduced stack size of thread using -Xss flag. 2. Try to use sharding instead of single large instance (if possible). 3. reduce cache size in solrconfig.xml Regards, Prateek Jain -Original Message- From: Alfonso Muñoz-Pomer Fuentes [mailto:amu...@ebi.ac.uk] Sent: 12 December 2016 01:31 PM To: solr-user@lucene.apache.org Subject: Re: OOMs in Solr I wasn’t aware of docValues and filterCache policies. We’ll try to fine-tune it and see if it helps. Thanks so much for the info! On 12/12/2016 12:13, Toke Eskildsen wrote: > On Mon, 2016-12-12 at 10:13 +, Alfonso Muñoz-Pomer Fuentes wrote: >> I’m writing because in our web application we’re using Solr 5.1.0 and >> currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are >> dedicated to Solr and nothing else is running there). > > This leaves very little memory for disk cache. I hope your underlying > storage is local SSDs and not spinning drives over the network. > >> We have four cores, that are this size: >> - 25.56 GB, Num Docs = 57,860,845 >> - 12.09 GB, Num Docs = 173,491,631 > > Smallish in bytes, largish in document count. > >> We aren’t indexing on this machine, and we’re getting OOM relatively >> quickly (after about 14 hours of regular use). > > The usual suspect for OOMs after some time is the filterCache. Worst- > case entries in that one takes up 1 bit/document, which means 7MB and > 22MB respectively for the two collections above. If your filterCache > is set to 1000 for those, this means (7MB+22MB)*1000 ~= all your heap. > > >> Right now we have a Cron job that restarts Solr every 12 hours, so >> it’s not pretty. We use faceting quite heavily > > Hopefully on docValued fields? > >> and mostly as a document storage server (we want full data sets >> instead of the n most relevant results). > > Hopefully with deep paging, as opposed to rows=173491631? > >> I don’t know if what we’re experiencing is usual given the index size >> and memory constraint of the VM, or something looks like it’s wildly >> misconfigured. > > I would have guessed that your heap was quite large enough for a > static index, but that is just ... guesswork. > > Would upgrading to Solr 6 make sense? > > It would not hep in itself, but if you also switched to using > streaming for your assumedly large exports, it would lower memory > requirements. > > - Toke Eskildsen, State and University Library, Denmark > >> -- Alfonso Muñoz-Pomer Fuentes Software Engineer @ Expression Atlas Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Tel:+ 44 (0) 1223 49 2633 Skype: amunozpomer
Re: OOMs in Solr
I wasn’t aware of docValues and filterCache policies. We’ll try to fine-tune it and see if it helps. Thanks so much for the info! On 12/12/2016 12:13, Toke Eskildsen wrote: On Mon, 2016-12-12 at 10:13 +, Alfonso Muñoz-Pomer Fuentes wrote: I’m writing because in our web application we’re using Solr 5.1.0 and currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are dedicated to Solr and nothing else is running there). This leaves very little memory for disk cache. I hope your underlying storage is local SSDs and not spinning drives over the network. We have four cores, that are this size: - 25.56 GB, Num Docs = 57,860,845 - 12.09 GB, Num Docs = 173,491,631 Smallish in bytes, largish in document count. We aren’t indexing on this machine, and we’re getting OOM relatively quickly (after about 14 hours of regular use). The usual suspect for OOMs after some time is the filterCache. Worst- case entries in that one takes up 1 bit/document, which means 7MB and 22MB respectively for the two collections above. If your filterCache is set to 1000 for those, this means (7MB+22MB)*1000 ~= all your heap. Right now we have a Cron job that restarts Solr every 12 hours, so it’s not pretty. We use faceting quite heavily Hopefully on docValued fields? and mostly as a document storage server (we want full data sets instead of the n most relevant results). Hopefully with deep paging, as opposed to rows=173491631? I don’t know if what we’re experiencing is usual given the index size and memory constraint of the VM, or something looks like it’s wildly misconfigured. I would have guessed that your heap was quite large enough for a static index, but that is just ... guesswork. Would upgrading to Solr 6 make sense? It would not hep in itself, but if you also switched to using streaming for your assumedly large exports, it would lower memory requirements. - Toke Eskildsen, State and University Library, Denmark -- Alfonso Muñoz-Pomer Fuentes Software Engineer @ Expression Atlas Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Tel:+ 44 (0) 1223 49 2633 Skype: amunozpomer
Re: OOMs in Solr
Thanks for the reply. Here’s some more info... Disk space: 39 GB / 148 GB (used / available) Deployment model: Single instance JVM version: 1.7.0_04 Number of queries: avgRequestsPerSecond: 0.5478469104833896 GC algorithm: None specified, so I guess it defaults to the parallel GC. On 12/12/2016 10:22, Prateek Jain J wrote: Please provide some information like, disk space available deployment model of solr like solr-cloud or single instance jvm version no. of queries and type of queries etc. GC algorithm used etc. Regards, Prateek Jain -Original Message- From: Alfonso Muñoz-Pomer Fuentes [mailto:amu...@ebi.ac.uk] Sent: 12 December 2016 10:14 AM To: solr-user@lucene.apache.org Subject: OOMs in Solr Hi Solr users, I’m writing because in our web application we’re using Solr 5.1.0 and currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are dedicated to Solr and nothing else is running there). We have four cores, that are this size: - 25.56 GB, Num Docs = 57,860,845 - 12.09 GB, Num Docs = 173,491,631 (The other two cores are about 10 MB, 20k docs) We aren’t indexing on this machine, and we’re getting OOM relatively quickly (after about 14 hours of regular use). Right now we have a Cron job that restarts Solr every 12 hours, so it’s not pretty. We use faceting quite heavily and mostly as a document storage server (we want full data sets instead of the n most relevant results). I don’t know if what we’re experiencing is usual given the index size and memory constraint of the VM, or something looks like it’s wildly misconfigured. What do you think? Any useful pointers for some tuning we could do to improve the service? Would upgrading to Solr 6 make sense? Thanks a lot in advance. -- Alfonso Muñoz-Pomer Fuentes Software Engineer @ Expression Atlas Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Tel:+ 44 (0) 1223 49 2633 Skype: amunozpomer -- Alfonso Muñoz-Pomer Fuentes Software Engineer @ Expression Atlas Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Tel:+ 44 (0) 1223 49 2633 Skype: amunozpomer
Re: OOMs in Solr
On Mon, 2016-12-12 at 10:13 +, Alfonso Muñoz-Pomer Fuentes wrote: > I’m writing because in our web application we’re using Solr 5.1.0 > and currently we’re hosting it on a VM with 32 GB of RAM (of which 30 > are dedicated to Solr and nothing else is running there). This leaves very little memory for disk cache. I hope your underlying storage is local SSDs and not spinning drives over the network. > We have four cores, that are this size: > - 25.56 GB, Num Docs = 57,860,845 > - 12.09 GB, Num Docs = 173,491,631 Smallish in bytes, largish in document count. > We aren’t indexing on this machine, and we’re getting OOM relatively > quickly (after about 14 hours of regular use). The usual suspect for OOMs after some time is the filterCache. Worst- case entries in that one takes up 1 bit/document, which means 7MB and 22MB respectively for the two collections above. If your filterCache is set to 1000 for those, this means (7MB+22MB)*1000 ~= all your heap. > Right now we have a Cron job that restarts Solr every 12 hours, so > it’s not pretty. We use faceting quite heavily Hopefully on docValued fields? > and mostly as a document storage server (we want full data sets > instead of the n most relevant results). Hopefully with deep paging, as opposed to rows=173491631? > I don’t know if what we’re experiencing is usual given the index size > and memory constraint of the VM, or something looks like it’s wildly > misconfigured. I would have guessed that your heap was quite large enough for a static index, but that is just ... guesswork. Would upgrading to Solr 6 make sense? It would not hep in itself, but if you also switched to using streaming for your assumedly large exports, it would lower memory requirements. - Toke Eskildsen, State and University Library, Denmark >
RE: OOMs in Solr
Please provide some information like, disk space available deployment model of solr like solr-cloud or single instance jvm version no. of queries and type of queries etc. GC algorithm used etc. Regards, Prateek Jain -Original Message- From: Alfonso Muñoz-Pomer Fuentes [mailto:amu...@ebi.ac.uk] Sent: 12 December 2016 10:14 AM To: solr-user@lucene.apache.org Subject: OOMs in Solr Hi Solr users, I’m writing because in our web application we’re using Solr 5.1.0 and currently we’re hosting it on a VM with 32 GB of RAM (of which 30 are dedicated to Solr and nothing else is running there). We have four cores, that are this size: - 25.56 GB, Num Docs = 57,860,845 - 12.09 GB, Num Docs = 173,491,631 (The other two cores are about 10 MB, 20k docs) We aren’t indexing on this machine, and we’re getting OOM relatively quickly (after about 14 hours of regular use). Right now we have a Cron job that restarts Solr every 12 hours, so it’s not pretty. We use faceting quite heavily and mostly as a document storage server (we want full data sets instead of the n most relevant results). I don’t know if what we’re experiencing is usual given the index size and memory constraint of the VM, or something looks like it’s wildly misconfigured. What do you think? Any useful pointers for some tuning we could do to improve the service? Would upgrading to Solr 6 make sense? Thanks a lot in advance. -- Alfonso Muñoz-Pomer Fuentes Software Engineer @ Expression Atlas Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Tel:+ 44 (0) 1223 49 2633 Skype: amunozpomer