Hi Jacek, I was wondering if I could use this approach itself.
It is basically a CSV read in as follows: val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val df = sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", "true").option("header", "true").load("/data/stg/table2") val current_date = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy') ").collect.apply(0).getString(0) def ChangeDate(word : String) : String = { return word.substring(6,10)+"-"+word.substring(3,5)+"-"+word.substring(0,2) } // // Register it as a custom UDF // sqlContext.udf.register("ChangeDate", ChangeDate(_:String)) The DF has the following schema scala> df.printSchema root |-- Invoice Number: string (nullable = true) |-- Payment date: string (nullable = true) |-- Net: string (nullable = true) |-- VAT: string (nullable = true) |-- Total: string (nullable = true) Now logically I want to filter out all "Payment date" values more than 6 months old, I.e. current_date - "Payment date" > 6 months For example use months_difference (current, "Payment date") > 6 However, I need to convert "Payment date" from format "dd/MM/yyyy" to " yyyy-MM-dd" first hence the UDF The question is will this approach work? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 23 March 2016 at 21:26, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > Why don't you use Datasets? You'd cut the number of getStrings and > it'd read nicer to your eyes. Also, doing such transformations would > *likely* be easier. > > p.s. Please gist your example to fix it. > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Wed, Mar 23, 2016 at 10:20 PM, Mich Talebzadeh > <mich.talebza...@gmail.com> wrote: > > > > How can I convert the following from String to datetime > > > > scala> df.map(x => (x.getString(1), ChangeDate(x.getString(1)))).take(1) > > res60: Array[(String, String)] = Array((10/02/2014,2014-02-10)) > > > > Please note that the custom UDF ChangeDate() has revered the string value > > from "dd/MM/yyyy" to "yyyy-MM-dd" > > > > Now I need to convert ChangeDate(x.getString(1)) from String to datetime? > > > > scala> df.map(x => (x.getString(1), > > ChangeDate(x.getString(1)).toDate)).take(1) > > <console>:25: error: value toDate is not a member of String > > df.map(x => (x.getString(1), > > ChangeDate(x.getString(1)).toDate)).take(1) > > > > Or > > > > scala> df.map(x => (x.getString(1), > > ChangeDate(x.getString(1)).cast("date"))).take(1) > > <console>:25: error: value cast is not a member of String > > df.map(x => (x.getString(1), > > ChangeDate(x.getString(1)).cast("date"))).take(1) > > > > > > Thanks, > > > > > > Dr Mich Talebzadeh > > > > > > > > LinkedIn > > > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >