Thanks Udit. We have been using with /*/*/* . Now when I use /*/* its much faster now. But why are we saying in Quick Start Guide that we need to use extra /*. What is the side effect or advantage of putting extra /* ?
On 2021/04/26 04:20:53, Udit Mehrotra <[email protected]> wrote: > In the example provided do what is the exact path you pass to hudi data > source to read: > > s3a://sample_table/*/*/* > > Or > > s3a://sample_table/*/* > > Is my question. The first one is bound to be much slower because of the > nature of solars listing as Hudi filter will be applied to each and every > file. You should use the second example of globing till the partition level > not file. > > Sent from my iPhone > > > On Apr 25, 2021, at 9:15 PM, Tanuj <[email protected]> wrote: > > > > For eg. if we have a table s3a://sample_table/ with partition path as year > > and then month > > When we read partitions using glob string like > > s3a://sample_table/year=2020/month={1,2,3,4}. Performance is good > > But when we read the whole table without providing any partition info like > > s3a://sample?table/. It does the whole listing of partitions I believe and > > it becomes slow. We have close to 1k partitions in some of the tables > > > >> On 2021/04/26 04:07:31, Udit Mehrotra <[email protected]> wrote: > >> When you read the whole table using datasource do you provide glob path up > >> to file level or partition level ? > >> > >> Sent from my iPhone > >> > >>>> On Apr 25, 2021, at 9:05 PM, Tanuj <[email protected]> wrote: > >>> > >>> Thanks Udit. I am just reading Spark data Source to read the full table. > >>> Sometimes we provide partitions and performance is ok and sometimes we > >>> cant due to the nature of data. Are you looking for HUDI parameters that > >>> we set while reading the table. > >>> > >>> > >>>> On 2021/04/26 04:02:31, Udit Mehrotra <[email protected]> wrote: > >>>> Hi Tanuj, > >>>> > >>>> Can you provide exact commands how you are reading the table ? We might > >>>> be able to guide based on that. > >>>> > >>>> Thanks, > >>>> Udit > >>>> > >>>> Sent from my iPhone > >>>> > >>>>>> On Apr 25, 2021, at 8:34 PM, Tanuj <[email protected]> wrote: > >>>>> > >>>>> Hi, > >>>>> We are using HUDI 0.6 and noticed that some hudi tables are very slow > >>>>> to read specially with large number of partitions, probably due to S3 > >>>>> listing. I know in later versions of HUDI we have fixed some of the > >>>>> issues but it will take us some time to migrate . Is there anything in > >>>>> 0.6 I can leverage ? > >>>>> > >>>>> I also dont understand what ./aux does as these folders are empty for > >>>>> us. We sometimes do S3 to S3 copy and read HUDI tables from the new > >>>>> copied location and able to read without .aux/ folders. > >>>>> When S3 copy works it doesnt copy empty folders. > >>>>> > >>>>> Thnaks, > >>>>> Tanu > >>>> > >> >
