Re: How to load only the data of the last partition

2016-11-18 Thread Rabin Banerjee
HI , In order to do that you can write code to read/list a HDFS directory first , then list its sub-directories . In this way using custom logic ,first identify the latest year/month/version , then read the avro in that dir in a DF, then add year/month/version to that DF using withColumn.

Re: How to load only the data of the last partition

2016-11-18 Thread Samy Dindane
Thank you Daniel. Unfortunately, we don't use Hive but bare (Avro) files. On 11/17/2016 08:47 PM, Daniel Haviv wrote: Hi Samy, If you're working with hive you could create a partitioned table and update it's partitions' locations to the last version so when you'll query it using spark,

Re: How to load only the data of the last partition

2016-11-17 Thread Daniel Haviv
Hi Samy, If you're working with hive you could create a partitioned table and update it's partitions' locations to the last version so when you'll query it using spark, you'll always get the latest version. Daniel On Thu, Nov 17, 2016 at 9:05 PM, Samy Dindane wrote: > Hi, > >

How to load only the data of the last partition

2016-11-17 Thread Samy Dindane
Hi, I have some data partitioned this way: /data/year=2016/month=9/version=0 /data/year=2016/month=10/version=0 /data/year=2016/month=10/version=1 /data/year=2016/month=10/version=2 /data/year=2016/month=10/version=3 /data/year=2016/month=11/version=0 /data/year=2016/month=11/version=1 When