On 21 March 2016 at 17:57, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

>
> Hi,
>
> For test purposes I am ready a simple csv file as follows:
>
> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
> "true").option("header", "true").load("/data/stg/table2")
> df: org.apache.spark.sql.DataFrame = [Invoice Number: string, Payment
> date: string, Net: string, VAT: string, Total: string]
>
> For this work I am interested in column "Payment Date" > 6 months old from
> today
>
> Data is stored in the following format for that column
>
> scala> df.select("Payment date").take(2)
> res40: Array[org.apache.spark.sql.Row] = Array([10/02/2014], [17/02/2014])
>
> stored as 'dd/MM/yyyy'
>
> The current time I get as
>
> scala> val today = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(),
> 'dd/MM/yyyy') ").collect.apply(0).getString(0)
> today: String = 21/03/2016
>
>
> So I want to filter the csv file
>
> scala>  df.filter(col("Payment date") < lit(today)).show(2)
> +--------------+------------+---------+-----+---------+
> |Invoice Number|Payment date|      Net|  VAT|    Total|
> +--------------+------------+---------+-----+---------+
> |           360|  10/02/2014|?2,500.00|?0.00|?2,500.00|
> |           361|  17/02/2014|?2,500.00|?0.00|?2,500.00|
> +--------------+------------+---------+-----+---------+
>
>
> However, I want to use datediff() function here not just < today!
>
>
Could you not compute which the date of the 6 month cut-off point and use
that in place of today?

Looking at the api I see an add_month(), date_add() and date_sub() methods,
the first adds a number of months to a start date (would adding a -ve
number of months to the current date work?), the latter two add or subtract
a specified number of days to/from a date, these are available in 1.5.0
onwards.

Alternatively outside of the SQL api (e.g. in a UDF) you could use
something like:

val c = Calendar.getInstance()
> c.setTime(new Date(System.currentTimeMillis()))
> c.add(Calendar.MONTH, -6)
> val date: Date = c.getTime


Regards,

James




>
> Obviously one can store the file as a table and use SQL on it. However, I
> want to see if there are other ways using fp.
>
> Thanks
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>

Reply via email to