Here are some more answers:

On Thu, May 27, 2021 at 10:09 AM Akshay Bhasin (BLOOMBERG/ 731 LEX) <
[email protected]> wrote:

> Hi Ted,
>
> Yes sure - below are the 2 reasons for it -
>
> 1) If I run 5 drill machines in a cluster, all connected to a single end
> point at s3, I'll have to use the machines to create the parquet files.
> Now, there are 2 sub questions here -
>
> - I'm not sure if a single drill end point is exposed for me to query ...
> a unique cluster ID I can use where all requests will be load balanced ?
>

I believe that any of the 5 drill machines can handle queries completely
symmetrically. When a query is received, the planning is done and execution
fragments are scheduled on the other nodes.

As such, you can either build a load balancer in front of the cluster or
you can do roughly the same thing using DNS round-robin. It won't make a
lot of difference, in practice, though because the load is spread around
pretty well even if only one node does all of the planning (at least if
your queries involve a lot of work).


> - What if the node goes down ? For instance, on a single node (say A in
> above example) - one user is running a read query & at the same time I run
> a create table query ? That would block and congest the node.
>

If a node goes down, any query involving it will fail. The loss will be
detected in a few seconds and any queries accepted by the cluster during
that time may hang up a little bit. Once the failure has been detected,
operation will continue without any problems. Clients may or may not retry
their queries automatically (I think that most won't).


>
> 2) This is a minor one - and I could be wrong - I'm not sure drill can
> write to s3 bucket. I think you can only put/upload files there, you cannot
> write to it.
>

Charles' answer was on the mark here.

Reply via email to