HI , In order to do that you can write code to read/list a HDFS directory first , then list its sub-directories . In this way using custom logic ,first identify the latest year/month/version , then read the avro in that dir in a DF, then add year/month/version to that DF using withColumn.
Regards, R Banerjee On Fri, Nov 18, 2016 at 2:41 PM, Samy Dindane <s...@dindane.com> wrote: > Thank you Daniel. Unfortunately, we don't use Hive but bare (Avro) files. > > > On 11/17/2016 08:47 PM, Daniel Haviv wrote: > >> Hi Samy, >> If you're working with hive you could create a partitioned table and >> update it's partitions' locations to the last version so when you'll query >> it using spark, you'll always get the latest version. >> >> Daniel >> >> On Thu, Nov 17, 2016 at 9:05 PM, Samy Dindane <s...@dindane.com <mailto: >> s...@dindane.com>> wrote: >> >> Hi, >> >> I have some data partitioned this way: >> >> /data/year=2016/month=9/version=0 >> /data/year=2016/month=10/version=0 >> /data/year=2016/month=10/version=1 >> /data/year=2016/month=10/version=2 >> /data/year=2016/month=10/version=3 >> /data/year=2016/month=11/version=0 >> /data/year=2016/month=11/version=1 >> >> When using this data, I'd like to load the last version only of each >> month. >> >> A simple way to do this is to do >> `load("/data/year=2016/month=11/version=3")` >> instead of doing `load("/data")`. >> The drawback of this solution is the loss of partitioning information >> such as `year` and `month`, which means it would not be possible to apply >> operations based on the year or the month anymore. >> >> Is it possible to ask Spark to load the last version only of each >> month? How would you go about this? >> >> Thank you, >> >> Samy >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org <mailto: >> user-unsubscr...@spark.apache.org> >> >> >> > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >