Hi Jacek,

I was wondering if I could use this approach itself.

It is basically a CSV read in as follows:

val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header", "true").load("/data/stg/table2")
val current_date = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(),
'dd/MM/yyyy') ").collect.apply(0).getString(0)
def ChangeDate(word : String) : String = {
   return
word.substring(6,10)+"-"+word.substring(3,5)+"-"+word.substring(0,2)
}
//
// Register it as a custom UDF
//
sqlContext.udf.register("ChangeDate", ChangeDate(_:String))

The DF has the following schema

scala> df.printSchema
root
 |-- Invoice Number: string (nullable = true)
 |-- Payment date: string (nullable = true)
 |-- Net: string (nullable = true)
 |-- VAT: string (nullable = true)
 |-- Total: string (nullable = true)


Now logically I want to filter out all "Payment date" values more than 6
months old,

I.e.

current_date - "Payment date" > 6 months

For example use months_difference (current, "Payment date") > 6

However, I need to convert "Payment date" from format "dd/MM/yyyy" to "
yyyy-MM-dd" first hence the UDF

The question is will this approach work?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 23 March 2016 at 21:26, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Why don't you use Datasets? You'd cut the number of getStrings and
> it'd read nicer to your eyes. Also, doing such transformations would
> *likely* be easier.
>
> p.s. Please gist your example to fix it.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Wed, Mar 23, 2016 at 10:20 PM, Mich Talebzadeh
> <mich.talebza...@gmail.com> wrote:
> >
> > How can I convert the following from String to datetime
> >
> > scala> df.map(x => (x.getString(1), ChangeDate(x.getString(1)))).take(1)
> > res60: Array[(String, String)] = Array((10/02/2014,2014-02-10))
> >
> > Please note that the custom UDF ChangeDate() has revered the string value
> > from "dd/MM/yyyy" to "yyyy-MM-dd"
> >
> > Now I need to convert ChangeDate(x.getString(1)) from String to datetime?
> >
> > scala> df.map(x => (x.getString(1),
> > ChangeDate(x.getString(1)).toDate)).take(1)
> > <console>:25: error: value toDate is not a member of String
> >               df.map(x => (x.getString(1),
> > ChangeDate(x.getString(1)).toDate)).take(1)
> >
> > Or
> >
> > scala> df.map(x => (x.getString(1),
> > ChangeDate(x.getString(1)).cast("date"))).take(1)
> > <console>:25: error: value cast is not a member of String
> >               df.map(x => (x.getString(1),
> > ChangeDate(x.getString(1)).cast("date"))).take(1)
> >
> >
> > Thanks,
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> >
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to