Re: VectorUDT with spark.ml.linalg.Vector

2016-08-18 Thread Yanbo Liang
@Michal

Yes, we have public VectorUDT in spark.mllib package at 1.6, and this class
is still existing in 2.0.
And from 2.0, we provide a new VectorUDT in spark.ml package and make it
private temporary (will be public in the near future).
Since from 2.0, spark.mllib package will be in maintenance mode, so we
strongly recommend users to use the DataFrame-based spark.ml API.

Thanks
Yanbo

2016-08-17 11:46 GMT-07:00 Michał Zieliński :

> I'm using Spark 1.6.2 for Vector-based UDAF and this works:
>
> def inputSchema: StructType = new StructType().add("input", new
> VectorUDT())
>
> Maybe it was made private in 2.0
>
> On 17 August 2016 at 05:31, Alexey Svyatkovskiy 
> wrote:
>
>> Hi Yanbo,
>>
>> Thanks for your reply. I will keep an eye on that pull request.
>> For now, I decided to just put my code inside org.apache.spark.ml to be
>> able to access private classes.
>>
>> Thanks,
>> Alexey
>>
>> On Tue, Aug 16, 2016 at 11:13 PM, Yanbo Liang  wrote:
>>
>>> It seams that VectorUDT is private and can not be accessed out of Spark
>>> currently. It should be public but we need to do some refactor before make
>>> it public. You can refer the discussion at https://github.com/apache/s
>>> park/pull/12259 .
>>>
>>> Thanks
>>> Yanbo
>>>
>>> 2016-08-16 9:48 GMT-07:00 alexeys :
>>>
 I am writing an UDAF to be applied to a data frame column of type Vector
 (spark.ml.linalg.Vector). I rely on spark/ml/linalg so that I do not
 have to
 go back and forth between dataframe and RDD.

 Inside the UDAF, I have to specify a data type for the input, buffer,
 and
 output (as usual). VectorUDT is what I would use with
 spark.mllib.linalg.Vector:
 https://github.com/apache/spark/blob/master/mllib/src/main/s
 cala/org/apache/spark/mllib/linalg/Vectors.scala

 However, when I try to import it from spark.ml instead: import
 org.apache.spark.ml.linalg.VectorUDT
 I get a runtime error (no errors during the build):

 class VectorUDT in package linalg cannot be accessed in package
 org.apache.spark.ml.linalg

 Is it expected/can you suggest a workaround?

 I am using Spark 2.0.0

 Thanks,
 Alexey



 --
 View this message in context: http://apache-spark-user-list.
 1001560.n3.nabble.com/VectorUDT-with-spark-ml-linalg-Vector-
 tp27542.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org


>>>
>>
>


Re: VectorUDT with spark.ml.linalg.Vector

2016-08-17 Thread Michał Zieliński
I'm using Spark 1.6.2 for Vector-based UDAF and this works:

def inputSchema: StructType = new StructType().add("input", new VectorUDT())

Maybe it was made private in 2.0

On 17 August 2016 at 05:31, Alexey Svyatkovskiy 
wrote:

> Hi Yanbo,
>
> Thanks for your reply. I will keep an eye on that pull request.
> For now, I decided to just put my code inside org.apache.spark.ml to be
> able to access private classes.
>
> Thanks,
> Alexey
>
> On Tue, Aug 16, 2016 at 11:13 PM, Yanbo Liang  wrote:
>
>> It seams that VectorUDT is private and can not be accessed out of Spark
>> currently. It should be public but we need to do some refactor before make
>> it public. You can refer the discussion at https://github.com/apache/s
>> park/pull/12259 .
>>
>> Thanks
>> Yanbo
>>
>> 2016-08-16 9:48 GMT-07:00 alexeys :
>>
>>> I am writing an UDAF to be applied to a data frame column of type Vector
>>> (spark.ml.linalg.Vector). I rely on spark/ml/linalg so that I do not
>>> have to
>>> go back and forth between dataframe and RDD.
>>>
>>> Inside the UDAF, I have to specify a data type for the input, buffer, and
>>> output (as usual). VectorUDT is what I would use with
>>> spark.mllib.linalg.Vector:
>>> https://github.com/apache/spark/blob/master/mllib/src/main/s
>>> cala/org/apache/spark/mllib/linalg/Vectors.scala
>>>
>>> However, when I try to import it from spark.ml instead: import
>>> org.apache.spark.ml.linalg.VectorUDT
>>> I get a runtime error (no errors during the build):
>>>
>>> class VectorUDT in package linalg cannot be accessed in package
>>> org.apache.spark.ml.linalg
>>>
>>> Is it expected/can you suggest a workaround?
>>>
>>> I am using Spark 2.0.0
>>>
>>> Thanks,
>>> Alexey
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/VectorUDT-with-spark-ml-linalg-Vector-tp27542.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>


Re: VectorUDT with spark.ml.linalg.Vector

2016-08-16 Thread Alexey Svyatkovskiy
Hi Yanbo,

Thanks for your reply. I will keep an eye on that pull request.
For now, I decided to just put my code inside org.apache.spark.ml to be
able to access private classes.

Thanks,
Alexey

On Tue, Aug 16, 2016 at 11:13 PM, Yanbo Liang  wrote:

> It seams that VectorUDT is private and can not be accessed out of Spark
> currently. It should be public but we need to do some refactor before make
> it public. You can refer the discussion at https://github.com/apache/
> spark/pull/12259 .
>
> Thanks
> Yanbo
>
> 2016-08-16 9:48 GMT-07:00 alexeys :
>
>> I am writing an UDAF to be applied to a data frame column of type Vector
>> (spark.ml.linalg.Vector). I rely on spark/ml/linalg so that I do not have
>> to
>> go back and forth between dataframe and RDD.
>>
>> Inside the UDAF, I have to specify a data type for the input, buffer, and
>> output (as usual). VectorUDT is what I would use with
>> spark.mllib.linalg.Vector:
>> https://github.com/apache/spark/blob/master/mllib/src/main/
>> scala/org/apache/spark/mllib/linalg/Vectors.scala
>>
>> However, when I try to import it from spark.ml instead: import
>> org.apache.spark.ml.linalg.VectorUDT
>> I get a runtime error (no errors during the build):
>>
>> class VectorUDT in package linalg cannot be accessed in package
>> org.apache.spark.ml.linalg
>>
>> Is it expected/can you suggest a workaround?
>>
>> I am using Spark 2.0.0
>>
>> Thanks,
>> Alexey
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/VectorUDT-with-spark-ml-linalg-Vector-tp27542.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


Re: VectorUDT with spark.ml.linalg.Vector

2016-08-16 Thread Yanbo Liang
It seams that VectorUDT is private and can not be accessed out of Spark
currently. It should be public but we need to do some refactor before make
it public. You can refer the discussion at
https://github.com/apache/spark/pull/12259 .

Thanks
Yanbo

2016-08-16 9:48 GMT-07:00 alexeys :

> I am writing an UDAF to be applied to a data frame column of type Vector
> (spark.ml.linalg.Vector). I rely on spark/ml/linalg so that I do not have
> to
> go back and forth between dataframe and RDD.
>
> Inside the UDAF, I have to specify a data type for the input, buffer, and
> output (as usual). VectorUDT is what I would use with
> spark.mllib.linalg.Vector:
> https://github.com/apache/spark/blob/master/mllib/src/
> main/scala/org/apache/spark/mllib/linalg/Vectors.scala
>
> However, when I try to import it from spark.ml instead: import
> org.apache.spark.ml.linalg.VectorUDT
> I get a runtime error (no errors during the build):
>
> class VectorUDT in package linalg cannot be accessed in package
> org.apache.spark.ml.linalg
>
> Is it expected/can you suggest a workaround?
>
> I am using Spark 2.0.0
>
> Thanks,
> Alexey
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/VectorUDT-with-spark-ml-linalg-Vector-tp27542.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>