On 21 March 2016 at 17:57, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> > Hi, > > For test purposes I am ready a simple csv file as follows: > > val df = > sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", > "true").option("header", "true").load("/data/stg/table2") > df: org.apache.spark.sql.DataFrame = [Invoice Number: string, Payment > date: string, Net: string, VAT: string, Total: string] > > For this work I am interested in column "Payment Date" > 6 months old from > today > > Data is stored in the following format for that column > > scala> df.select("Payment date").take(2) > res40: Array[org.apache.spark.sql.Row] = Array([10/02/2014], [17/02/2014]) > > stored as 'dd/MM/yyyy' > > The current time I get as > > scala> val today = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(), > 'dd/MM/yyyy') ").collect.apply(0).getString(0) > today: String = 21/03/2016 > > > So I want to filter the csv file > > scala> df.filter(col("Payment date") < lit(today)).show(2) > +--------------+------------+---------+-----+---------+ > |Invoice Number|Payment date| Net| VAT| Total| > +--------------+------------+---------+-----+---------+ > | 360| 10/02/2014|?2,500.00|?0.00|?2,500.00| > | 361| 17/02/2014|?2,500.00|?0.00|?2,500.00| > +--------------+------------+---------+-----+---------+ > > > However, I want to use datediff() function here not just < today! > > Could you not compute which the date of the 6 month cut-off point and use that in place of today? Looking at the api I see an add_month(), date_add() and date_sub() methods, the first adds a number of months to a start date (would adding a -ve number of months to the current date work?), the latter two add or subtract a specified number of days to/from a date, these are available in 1.5.0 onwards. Alternatively outside of the SQL api (e.g. in a UDF) you could use something like: val c = Calendar.getInstance() > c.setTime(new Date(System.currentTimeMillis())) > c.add(Calendar.MONTH, -6) > val date: Date = c.getTime Regards, James > > Obviously one can store the file as a table and use SQL on it. However, I > want to see if there are other ways using fp. > > Thanks > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > >