Hi Nitin,

Thanks for letting us know about the OOM issues. These are serious and we 
should focus on finding the cause and fixing them. In general, it is the goal 
of the Drill project that Drill suffer no OOM errors on a cluster configured 
properly for your target workload.

Thank you for filing a JIRA ticket. The stack trace in that ticket describes a 
connection shut down. Your e-mail mentioned an OOM error. Can you attach a 
stack trace or log entry that led you to believe you were getting an OOM error? 
How many queries are running at the time of the error?

As you know, Drill uses two kinds of memory: heap and off-heap (AKA "direct" or 
"unsafe.") Generally, you want much more off-heap than heap memory. But, until 
we know which kind is being exhausted, it is hard to say what to adjust.

If a Drillbit fails, all queries anywhere on the cluster will fail. The reason 
is simple: all queries are distributed across all nodes. This is why we must 
find and fix the underlying OOM error.

On a 64 GB machine, if you are running only Drill, you can give most of the 
memory to Drill itself. Determine how much your OS and other process need. 
Then, split the rest between heap and off-heap. It is very likely you have 
already customized the Drill memory settings: it is the first thing everyone 
does when deploying. [1] Check your settings.

Until we know if you are running out of heap vs. off-heap, it is hard to 
suggest which setting to adjust. If it is heap memory that is affected, then 
you can increase the heap memory setting to see what affect that has on 
Drillbit lifetime.

Thanks,
- Paul

[1] http://drill.apache.org/docs/configuring-drill-memory/




 

    On Tuesday, January 7, 2020, 08:45:46 AM PST, Nitin Pawar 
<nitinpawar...@gmail.com> wrote:  
 
 Hello Team
We have recently upgraded to drill-1.16 from drill-1.13 version
and we have started to notice lots of OOM issues .. its same setup with
changed binaries
till we figured out what’s the issue, we wanted to keep restarting
drillbits with cronjobs

my question is : *If a drill is restarted .. would the queries with this
node as foreman be resubmitted automatically ?*

Also we have a 64GB RAM machines. Can someone recommend memory setting for
this environment

-- 
Nitin Pawar  

Reply via email to