Spark ANSI SQL Support

2017-01-17 Thread Rishabh Bhardwaj
Hi All,

Does Spark 2.0 Sql support full ANSI SQL query standards?

Thanks,
Rishabh.


Re: Time-Series Analysis with Spark

2017-01-11 Thread Rishabh Bhardwaj
spark-ts currently do not support Seasonal ARIMA. There is an open Issue
for the same: https://github.com/sryza/spark-timeseries/issues/156

On Wed, Jan 11, 2017 at 3:50 PM, Sean Owen <so...@cloudera.com> wrote:

> https://github.com/sryza/spark-timeseries ?
>
> On Wed, Jan 11, 2017 at 10:11 AM Rishabh Bhardwaj <rbnex...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am exploring time-series forecasting with Spark.
>> I have some questions regarding this:
>>
>> 1. Is there any library/package out there in community of *Seasonal
>> ARIMA* implementation in Spark?
>>
>> 2. Is there any implementation of Dynamic Linear Model (*DLM*) on Spark?
>>
>> 3. What are the libraries which can be leveraged for time-series analysis
>> with Spark (like spark-ts)?
>>
>> 4. Is there any roadmap of having time-series algorithms like AR,MA,ARIMA
>> in Spark codebase itself?
>>
>> Thank You,
>> Rishabh.
>>
>


Time-Series Analysis with Spark

2017-01-11 Thread Rishabh Bhardwaj
Hi All,

I am exploring time-series forecasting with Spark.
I have some questions regarding this:

1. Is there any library/package out there in community of *Seasonal ARIMA*
implementation in Spark?

2. Is there any implementation of Dynamic Linear Model (*DLM*) on Spark?

3. What are the libraries which can be leveraged for time-series analysis
with Spark (like spark-ts)?

4. Is there any roadmap of having time-series algorithms like AR,MA,ARIMA
in Spark codebase itself?

Thank You,
Rishabh.


Re: write and call UDF in spark dataframe

2016-07-20 Thread Rishabh Bhardwaj
Hi Divya,

There is already "from_unixtime" exists in org.apache.spark.sql.frunctions,
Rabin has used that in the sql query,if you want to use it in dataframe DSL
you can try like this,

val new_df = df.select(from_unixtime($"time").as("newtime"))


Thanks,
Rishabh.

On Wed, Jul 20, 2016 at 4:21 PM, Rabin Banerjee <
dev.rabin.baner...@gmail.com> wrote:

> Hi Divya ,
>
> Try,
>
> val df = sqlContext.sql("select from_unixtime(ts,'-MM-dd') as `ts` from 
> mr")
>
> Regards,
> Rabin
>
> On Wed, Jul 20, 2016 at 12:44 PM, Divya Gehlot 
> wrote:
>
>> Hi,
>> Could somebody share example of writing and calling udf which converts
>> unix tme stamp to date tiime .
>>
>>
>> Thanks,
>> Divya
>>
>
>


Deploying ML Pipeline Model

2016-07-01 Thread Rishabh Bhardwaj
Hi All,

I am looking for ways to deploy a ML Pipeline model in production .
Spark has already proved to be a one of the best framework for model
training and creation, but once the ml pipeline model is ready how can I
deploy it outside spark context ?
MLlib model has toPMML method but today Pipeline model can not be saved to
PMML. There are some frameworks like MLeap which are trying to abstract
Pipeline Model and provide ML Pipeline Model deployment outside spark
context,but currently they don't have most of the ml transformers and
estimators.
I am looking for related work going on this area.
Any pointers will be helpful.

Thanks,
Rishabh.


Re: Unable to register UDF with StructType

2015-11-06 Thread Rishabh Bhardwaj
Thanks for your response.
Actually I want to return a Row or (return struct).
But, attempting to register UDF returning Row throws the following error,

scala> def test(r:Row):Row = r
test: (r: org.apache.spark.sql.Row)org.apache.spark.sql.Row

scala> sqlContext.udf.register("test",test _)
java.lang.UnsupportedOperationException: Schema for type
org.apache.spark.sql.Row is not supported
at
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:153)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)


The use case I am working on will require creating Dynamic Case Classes.
It will be great help if you can give me pointers for creating Dynamic Case
Classes.

TIA.


On Fri, Nov 6, 2015 at 12:20 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> You are returning the type StructType not an instance of a struct (i.e.
> StringType instead of "string").  If you'd like to return a struct you
> should return a case class.
>
> case class StringInfo(numChars: Int, firstLetter: String)
> udf((s: String) => StringInfo(s.size, s.head))
>
> If you'd like to take a struct as input, use Row as the type.
>
> On Thu, Nov 5, 2015 at 9:53 PM, Rishabh Bhardwaj <rbnex...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I am unable to register a UDF with return type as StructType:
>>
>> scala> def test(r:StructType):StructType = { r }
>>>
>>> test: (r:
>>>> org.apache.spark.sql.types.StructType)org.apache.spark.sql.types.StructType
>>>
>>>
>>>> scala> sqlContext.udf.register("test",test _ )
>>>
>>> scala.MatchError: org.apache.spark.sql.types.StructType (of class
>>>> scala.reflect.internal.Types$TypeRef$$anon$6)
>>>
>>> at
>>>> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:101)
>>>
>>> at
>>>> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
>>>
>>> at
>>>> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:64)
>>>
>>>
>> Can someone throw some light on this ? and Is there any work around for
>> it ?
>>
>> TIA.
>>
>> Regards,
>> Rishabh.
>>
>
>


Unable to register UDF with StructType

2015-11-05 Thread Rishabh Bhardwaj
Hi all,

I am unable to register a UDF with return type as StructType:

scala> def test(r:StructType):StructType = { r }
>
> test: (r:
>> org.apache.spark.sql.types.StructType)org.apache.spark.sql.types.StructType
>
>
>> scala> sqlContext.udf.register("test",test _ )
>
> scala.MatchError: org.apache.spark.sql.types.StructType (of class
>> scala.reflect.internal.Types$TypeRef$$anon$6)
>
> at
>> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:101)
>
> at
>> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
>
> at
>> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:64)
>
>
Can someone throw some light on this ? and Is there any work around for it
?

TIA.

Regards,
Rishabh.


DataFrame column structure change

2015-08-07 Thread Rishabh Bhardwaj
Hi all,

I want to have some nesting structure from the existing columns of
the dataframe.
For that,,I am trying to transform a DF in the following way,but couldn't
do it.

scala df.printSchema
root
 |-- a: string (nullable = true)
 |-- b: string (nullable = true)
 |-- c: string (nullable = true)
 |-- d: string (nullable = true)
 |-- e: string (nullable = true)
 |-- f: string (nullable = true)

*To*

scala newDF.printSchema
root
 |-- a: string (nullable = true)
 |-- b: string (nullable = true)
 |-- c: string (nullable = true)
 |-- newCol: struct (nullable = true)
 ||-- d: string (nullable = true)
 ||-- e: string (nullable = true)


help me.

Regards,
Rishabh.


Re: DataFrame column structure change

2015-08-07 Thread Rishabh Bhardwaj
I am doing it by creating a new data frame out of the fields to be nested
and then join with the original DF.
Looking for some optimized solution here.

On Fri, Aug 7, 2015 at 2:06 PM, Rishabh Bhardwaj rbnex...@gmail.com wrote:

 Hi all,

 I want to have some nesting structure from the existing columns of
 the dataframe.
 For that,,I am trying to transform a DF in the following way,but couldn't
 do it.

 scala df.printSchema
 root
  |-- a: string (nullable = true)
  |-- b: string (nullable = true)
  |-- c: string (nullable = true)
  |-- d: string (nullable = true)
  |-- e: string (nullable = true)
  |-- f: string (nullable = true)

 *To*

 scala newDF.printSchema
 root
  |-- a: string (nullable = true)
  |-- b: string (nullable = true)
  |-- c: string (nullable = true)
  |-- newCol: struct (nullable = true)
  ||-- d: string (nullable = true)
  ||-- e: string (nullable = true)


 help me.

 Regards,
 Rishabh.



Cast Error DataFrame/RDD doing group by and case class

2015-07-30 Thread Rishabh Bhardwaj
Hi,
I have just started learning DF in sparks and encountered the following
error:

I am creating the following :
*case class A(a1:String,a2:String,a3:String)*
*case class B(b1:String,b2:String,b3:String)*
*case class C(key:A,value:Seq[B])*


Now I have to do a DF with struc
(key :{..},value:{..} i.e *case class C(key:A,value:B)*)

I want to do a group by on this DF which results in
(key:List{value1,value2,..}) and return DF after the operation.

I am implementing the following as:

1. *val x  = DF1.map(r= (r(0),r(1) )).groupByKey*
the data in x comes as expected

2.*val y =  x.map{case (k,v) = (
C(k.asInstanceOf[A],Seq(v.toSeq.asInstanceOf[B])))}*
so now when I am doing *y.toDF.show* I am getting the following error:


org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 10.0 failed 1 times, most recent failure: Lost task 0.0 in stage
10.0 (TID 12, localhost): *java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be
cast to $iwC$$iwC$A*


Thanks in advance.

Regards,
Rishabh.