Re: Small dataset query issue and the workaround we found

Paul Rogers Tue, 30 Aug 2022 04:24:31 -0700

Hi All,

As others have said, the only difference between plans for “small” and “large” 
queries is the queue size and memory. As I recall, those are spelled out in the 
docs.


Ensure that there is sufficient memory for the slicing up done by the queue, 
and the query. Memory is allocated to sorts, joins and hash aggregations.

- Paul

Sent from my iPhone

> On Aug 30, 2022, at 8:21 AM, James Turton <dz...@apache.org> wrote:
> 
> Hi François
> 
> This is quite intriguing. I believe that one difference in execution 
> conditions between the small and large queues is the amount memory that the 
> query will be allowed. It could be interesting to see if the problem is 
> connected to this difference by setting exec.queue.memory_ratio to 1.0. 
> Enabling debug logging in logback.xml might also reveal more about what 
> differs between these two executions of the same plan.
> 
> Regards
> James
> 
>> On 2022/08/25 18:13, François Méthot wrote:
>> Hi,
>> 
>> I am looking for an explanation to a workaround I have found to an issue
>> that has been bugging my team for the past weeks.
>> 
>> Last June we moved from Drill 1.12 to Drill 1.19... Long overdue upgrade!
>> Not long after we started getting the issue described  below.
>> 
>> We run a query daily on about 410GB of text data spread over  ~2200 files,
>> it has a cost of ~22 Billions queued as Large query
>> When the same query runs on 200MB spread over 130 files  (same data
>> format), with cost of 36 Millions queued as Large query, it never completes.
>> 
>> The small dataset query would stop doing any progress after a few minutes,
>> leaving on for hours, no progress, never complete.
>> 
>> The last running fragment is a HASH_PARTITION_SENDER showing 100% fragment
>> time.
>> 
>> After much shoot in the dark debugging session, analysing the src data etc.
>> 
>> We reviewed our cluster configuration,
>> when we changed
>>      exec.queue.threshold=30000000 to 60000000
>> to categorize our 200MB dataset as a small query,
>> 
>> The small dataset query started to work consistently in less than 10
>> seconds.
>> 
>> The physical plan is identical whether the query is Large or Small.
>> 
>> Is there a difference internally in drill execution whether the query is
>> small or large?
>> Would you be able to provide an explanation why this workaround works?
>> 
>> Cluster detail:
>> 8 drillbits running in kubernetes
>> 16GB direct mem
>> 4GB Heap
>> data stored on a very efficient nfs, exposed as k8s pv/pvc to drillbit pods.
>> 
>> Thanks for any insight you can provide on that setting or in regards to our
>> initial problem.
>> François
>> 
>

Re: Small dataset query issue and the workaround we found

Reply via email to