good idea. with the dates sorting correctly alphabetically i should be able to do something similar with strings
On Sun, Nov 1, 2015 at 4:06 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Try with max date, in your case it could make more sense to represent the > date as int > > Sent from my iPhone > > On 01 Nov 2015, at 21:03, Koert Kuipers <ko...@tresata.com> wrote: > > hello all, > i am trying to get familiar with spark sql partitioning support. > > my data is partitioned by date, so like this: > data/date=2015-01-01 > data/date=2015-01-02 > data/date=2015-01-03 > ... > > lets say i would like a batch process to read data for the latest date > only. how do i proceed? > generally the latest date will be yesterday, but it could be a day older > or maybe 2. > > i understand that i will have to do something like: > df.filter(df("date") === some_date_string_here) > > however i do now know what some_date_string_here should be. i would like > to inspect the available dates and pick the latest. is there an efficient > way to find out what the available partitions are? > > thanks! koert > > >