Hi, Yes. It can be an issue, probably good to get the table written using hive style partitioning. I will check on this more and get back to you
Balaji, do you know top of your head? Thanks Vinoth On Sat, Jul 4, 2020 at 11:22 PM selvaraj periyasamy < [email protected]> wrote: > Add some more info, my join condition would look for 180 days range > folders. > > On Sat, Jul 4, 2020 at 11:13 PM selvaraj periyasamy < > [email protected]> wrote: > > > Team, > > > > I have a question on keeping hive in sync. Due to a shared Hadoop > > Environment restricting me from using hudi 0.5.1 or higher version i > ended > > up using 0.5.0. Currently my hadoop cluster is having hive 1.2.x , which > > is not supporting Hudi to keep hive in sync. > > > > So , I am not using the hive feature. I am reading it as below. > > > > > > sparkSession. > > read. > > format("org.apache.hudi"). > > load("/projects/cdp/data/base/request_application/*/*"). > > createOrReplaceTempView(s"base_request_application") > > > > > > I am going to store 3 years worth of data partitioned by day/hour. When I > > load 3 years data, I would have (3*365*24) = 26280 directories. Using the > > above approach and reading every time, I see all the directories names > are > > indexed. Would it impact the perfromance during joining with other table, > > if i dont use hive way of partition pruning? > > > > Thanks, > > Selva > > > > >
