Hi,

I have some data partitioned this way:

/data/year=2016/month=9/version=0
/data/year=2016/month=10/version=0
/data/year=2016/month=10/version=1
/data/year=2016/month=10/version=2
/data/year=2016/month=10/version=3
/data/year=2016/month=11/version=0
/data/year=2016/month=11/version=1

When using this data, I'd like to load the last version only of each month.

A simple way to do this is to do `load("/data/year=2016/month=11/version=3")` instead of 
doing `load("/data")`.
The drawback of this solution is the loss of partitioning information such as 
`year` and `month`, which means it would not be possible to apply operations 
based on the year or the month anymore.

Is it possible to ask Spark to load the last version only of each month? How 
would you go about this?

Thank you,

Samy

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to