When you read the whole table using datasource do you provide glob path up to file level or partition level ?
Sent from my iPhone > On Apr 25, 2021, at 9:05 PM, Tanuj <[email protected]> wrote: > > Thanks Udit. I am just reading Spark data Source to read the full table. > Sometimes we provide partitions and performance is ok and sometimes we cant > due to the nature of data. Are you looking for HUDI parameters that we set > while reading the table. > > >> On 2021/04/26 04:02:31, Udit Mehrotra <[email protected]> wrote: >> Hi Tanuj, >> >> Can you provide exact commands how you are reading the table ? We might be >> able to guide based on that. >> >> Thanks, >> Udit >> >> Sent from my iPhone >> >>>> On Apr 25, 2021, at 8:34 PM, Tanuj <[email protected]> wrote: >>> >>> Hi, >>> We are using HUDI 0.6 and noticed that some hudi tables are very slow to >>> read specially with large number of partitions, probably due to S3 listing. >>> I know in later versions of HUDI we have fixed some of the issues but it >>> will take us some time to migrate . Is there anything in 0.6 I can leverage >>> ? >>> >>> I also dont understand what ./aux does as these folders are empty for us. >>> We sometimes do S3 to S3 copy and read HUDI tables from the new copied >>> location and able to read without .aux/ folders. >>> When S3 copy works it doesnt copy empty folders. >>> >>> Thnaks, >>> Tanu >>
