Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/922
  
    This may be one of those times when we need to resort to a bit of design 
thinking.
    
    The core idea is that the user sets one environment variable to check the 
others. The first issue is that, if the user can't do the sums to set the Drill 
memory allocation right (with respect to actual memory), not sure how they will 
get the total memory variable right.
    
    OK, so we get the memory from the system, then do a percentage. That is 
better. But, what is the system memory? Is it total memory? Suppose the user 
says Drill gets 60%. We can now check. But, Drill is distributed. Newer nodes 
in a cluster may have 256GB, older nodes 128GB. Drill demands symmetrical 
resources so the memory given to Drill must be identical on all nodes, 
regardless of system memory. So, the percent of total system memory idea 
doesn't work in practice.
    
    So, maybe we express memory as the total *free* memory. Cool. We give Drill 
60%. Drill starts and everything is fine. Now, we also give Spark 60%. Spark 
starts. It complains in its logs (assuming we make this same change to the 
Spark startup scripts.) But, Spark uses its memory and causes Drill to fail. We 
check Drill logs. Nada. We have to check Spark's logs. Now, imagine doing this 
with five apps; the app that complains may not be the one to fail. And, imagine 
doing this across 100 nodes. Won't scale.
    
    Note that the problem is that we checked memory statically at startup. But, 
our problem was that things changed later: we launched an over-subscribed 
Spark. So, our script must run continuously, constantly checking if any new 
apps are launched. Since some apps grow memory over time, we have to check all 
other apps for total memory usage against that allocated to Drill.
    
    Now, presumably, all other apps are doing the same: Spark is continually 
checking, Storm is doing so, and so on. Now, the admin needs to gather all 
these logs (across dozens of nodes) and extract meaning. What we need, then, is 
a network endpoint to publish the information and a tool to gather and report 
that data. We've just invented monitoring tools.
    
    Take a step back, what we really want to know is available system memory 
vs. that consumed by apps. So, what we want is a Linux-level monitoring of free 
memory. And, since we have other things to do, we want alerts when free memory 
drops below some point. We've now invented alerting tools.
    
    Now, we got into this mess because we launched apps without concern about 
the total memory usage on each node. That is, we didn't plan our app load to 
fit into our available memory. So, we turn this around. We've got 128GB (say) 
of memory. How do we run only those apps that fit, deferring those that don't? 
We've just invented YARN, Mesos, Kubernetes and the like.
    
    Now we get to the reason for the -1. The proposed change adds significant 
complexity to the scripts, *but can never solve the actual oversubscription 
problem*. For that, we need a global resource manager.
    
    Now, suppose that someone wants to run Drill without such a manager. 
Perhaps some distribution does not provide this tool and instead provides a 
tool that simply launches processes, leaving it to each process to struggle 
with its own resources. In such an environment, the vendor can add a check, 
such as this one, that will fire on all nodes and warn the user about potential 
oversubscription *on that node*, *at that moment*, *for that app* in *one app's 
log file*.
    
    To facilitate this, we can do two things.
    
    1. In the vendor-specific `distrib-env.sh` file, do any memory setting 
adjustments that are wanted.
    2. Modify `drillbit.sh` to call a `drill-check.sh` script, if it exists, 
just prior to launching Drill.
    3. In the vendor-specific `distrib-env.sh` file, do the check proposed here.
    
    The only change needed in Apache Drill is step 2. Then each vendor can add 
the checks if they don't provide a resource manager. Those vendors (or users) 
that use YARN or Mesos or whatever don't need the checks because they have 
overall tools that solves the problem for them.
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to