Hi Nitin, Thanks for letting us know about the OOM issues. These are serious and we should focus on finding the cause and fixing them. In general, it is the goal of the Drill project that Drill suffer no OOM errors on a cluster configured properly for your target workload.
Thank you for filing a JIRA ticket. The stack trace in that ticket describes a connection shut down. Your e-mail mentioned an OOM error. Can you attach a stack trace or log entry that led you to believe you were getting an OOM error? How many queries are running at the time of the error? As you know, Drill uses two kinds of memory: heap and off-heap (AKA "direct" or "unsafe.") Generally, you want much more off-heap than heap memory. But, until we know which kind is being exhausted, it is hard to say what to adjust. If a Drillbit fails, all queries anywhere on the cluster will fail. The reason is simple: all queries are distributed across all nodes. This is why we must find and fix the underlying OOM error. On a 64 GB machine, if you are running only Drill, you can give most of the memory to Drill itself. Determine how much your OS and other process need. Then, split the rest between heap and off-heap. It is very likely you have already customized the Drill memory settings: it is the first thing everyone does when deploying. [1] Check your settings. Until we know if you are running out of heap vs. off-heap, it is hard to suggest which setting to adjust. If it is heap memory that is affected, then you can increase the heap memory setting to see what affect that has on Drillbit lifetime. Thanks, - Paul [1] http://drill.apache.org/docs/configuring-drill-memory/ On Tuesday, January 7, 2020, 08:45:46 AM PST, Nitin Pawar <nitinpawar...@gmail.com> wrote: Hello Team We have recently upgraded to drill-1.16 from drill-1.13 version and we have started to notice lots of OOM issues .. its same setup with changed binaries till we figured out what’s the issue, we wanted to keep restarting drillbits with cronjobs my question is : *If a drill is restarted .. would the queries with this node as foreman be resubmitted automatically ?* Also we have a 64GB RAM machines. Can someone recommend memory setting for this environment -- Nitin Pawar