Here are some more answers:
On Thu, May 27, 2021 at 10:09 AM Akshay Bhasin (BLOOMBERG/ 731 LEX) < [email protected]> wrote: > Hi Ted, > > Yes sure - below are the 2 reasons for it - > > 1) If I run 5 drill machines in a cluster, all connected to a single end > point at s3, I'll have to use the machines to create the parquet files. > Now, there are 2 sub questions here - > > - I'm not sure if a single drill end point is exposed for me to query ... > a unique cluster ID I can use where all requests will be load balanced ? > I believe that any of the 5 drill machines can handle queries completely symmetrically. When a query is received, the planning is done and execution fragments are scheduled on the other nodes. As such, you can either build a load balancer in front of the cluster or you can do roughly the same thing using DNS round-robin. It won't make a lot of difference, in practice, though because the load is spread around pretty well even if only one node does all of the planning (at least if your queries involve a lot of work). > - What if the node goes down ? For instance, on a single node (say A in > above example) - one user is running a read query & at the same time I run > a create table query ? That would block and congest the node. > If a node goes down, any query involving it will fail. The loss will be detected in a few seconds and any queries accepted by the cluster during that time may hang up a little bit. Once the failure has been detected, operation will continue without any problems. Clients may or may not retry their queries automatically (I think that most won't). > > 2) This is a minor one - and I could be wrong - I'm not sure drill can > write to s3 bucket. I think you can only put/upload files there, you cannot > write to it. > Charles' answer was on the mark here.
