bq. CSV data is stored in an underlying table in Hive (actually created and populated as an ORC table by Spark)
How is it possible? On Mon, Mar 28, 2016 at 1:50 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > A while back I was looking for functional programming to filter out > transactions older > n months etc. > > This turned out to be pretty easy. > > I get today's day as follows > > var today = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(), > 'yyyy-MM-dd') ").collect.apply(0).getString(0) > > > CSV data is stored in an underlying table in Hive (actually created and > populated as an ORC table by Spark) > > HiveContext.sql("use accounts") > var n = HiveContext.table("nw_10124772") > > scala> n.printSchema > root > |-- transactiondate: date (nullable = true) > |-- transactiontype: string (nullable = true) > |-- description: string (nullable = true) > |-- value: double (nullable = true) > |-- balance: double (nullable = true) > |-- accountname: string (nullable = true) > |-- accountnumber: integer (nullable = true) > > // > // Check for historical transactions > 60 months old > // > var old: Int = 60 > > val rs = n.filter(add_months(col("transactiondate"),old) < > lit(today)).select(lit(today), > col("transactiondate"),add_months(col("transactiondate"),old)).collect.foreach(println) > > [2016-03-27,2011-03-22,2016-03-22] > [2016-03-27,2011-03-22,2016-03-22] > [2016-03-27,2011-03-22,2016-03-22] > [2016-03-27,2011-03-22,2016-03-22] > [2016-03-27,2011-03-23,2016-03-23] > [2016-03-27,2011-03-23,2016-03-23] > > > Which seems to work. Any other suggestions will be appreciated. > > Thanks > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > >