Hi,

Yes. It can be an issue, probably good to get the table written using hive
style partitioning. I will check  on this more and get back to you

Balaji, do you know top of your head?

Thanks
Vinoth

On Sat, Jul 4, 2020 at 11:22 PM selvaraj periyasamy <
[email protected]> wrote:

> Add some more info, my join condition would look for 180 days range
> folders.
>
> On Sat, Jul 4, 2020 at 11:13 PM selvaraj periyasamy <
> [email protected]> wrote:
>
> > Team,
> >
> > I have a question on keeping hive in sync.  Due to a shared Hadoop
> > Environment restricting me from using hudi 0.5.1 or higher version i
> ended
> > up using 0.5.0.  Currently my hadoop cluster is having hive 1.2.x , which
> > is not supporting Hudi to keep hive in sync.
> >
> > So , I am not using the hive feature. I am reading it as below.
> >
> >
> > sparkSession.
> > read.
> > format("org.apache.hudi").
> > load("/projects/cdp/data/base/request_application/*/*").
> > createOrReplaceTempView(s"base_request_application")
> >
> >
> > I am going to store 3 years worth of data partitioned by day/hour. When I
> > load 3 years data, I would have (3*365*24) = 26280 directories. Using the
> > above approach and reading every time, I see all the directories names
> are
> > indexed. Would it impact the perfromance during joining with other table,
> > if i dont use hive way of partition pruning?
> >
> > Thanks,
> > Selva
> >
> >
>

Reply via email to