Re: HUDI 0.6 Read Table Performance

Udit Mehrotra Sun, 25 Apr 2021 21:07:48 -0700

When you read the whole table using datasource do you provide glob path up to 
file level or partition level ?


Sent from my iPhone

> On Apr 25, 2021, at 9:05 PM, Tanuj <[email protected]> wrote:
> 
> Thanks Udit. I am just reading Spark data Source to read the full table. 
> Sometimes we provide partitions and performance is ok and sometimes we cant 
> due to the nature of data. Are you looking for HUDI parameters that we set 
> while reading the table.
> 
> 
>> On 2021/04/26 04:02:31, Udit Mehrotra <[email protected]> wrote: 
>> Hi Tanuj,
>> 
>> Can you provide exact commands how you are reading the table ? We might be 
>> able to guide based on that.
>> 
>> Thanks,
>> Udit
>> 
>> Sent from my iPhone
>> 
>>>> On Apr 25, 2021, at 8:34 PM, Tanuj <[email protected]> wrote:
>>> 
>>> Hi,
>>> We are using HUDI 0.6 and noticed that some hudi tables are very slow to 
>>> read specially with large number of partitions, probably due to S3 listing. 
>>> I know in later versions of HUDI we have fixed some of the issues but it 
>>> will take us some time to migrate . Is there anything in 0.6 I can leverage 
>>> ?
>>> 
>>> I also dont understand what ./aux does as these folders are empty for us. 
>>> We sometimes do S3 to S3 copy and read HUDI tables from the new copied 
>>> location and able to read without .aux/ folders. 
>>> When S3 copy works it doesnt copy empty folders.
>>> 
>>> Thnaks,
>>> Tanu
>>

Re: HUDI 0.6 Read Table Performance

Reply via email to