Re: Embedding your own transformer in Spark.ml Pipleline

2015-06-04 Thread Peter Rudenko
Hi Brandon, they are available, but private to ml package. They are now 
public in 1.4. For 1.3.1 you can define your transformer in 
org.apache.spark.ml package - then you could use these traits.


Thanks,
Peter Rudenko

On 2015-06-04 20:28, Brandon Plaster wrote:
Is HasInputCol and HasOutputCol available in 1.3.1? I'm getting 
the following message when I'm trying to implement a Transformer and 
importing org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}:


error: object shared is not a member of package org.apache.spark.ml.param


and

error: trait HasInputCol in package param cannot be accessed in 
package org.apache.spark.ml.param



On Tue, Jun 2, 2015 at 1:51 PM, Peter Rudenko petro.rude...@gmail.com 
mailto:petro.rude...@gmail.com wrote:


Hi Dimple,
take a look to existing transformers:

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
(*it's for spark-1.4)

The idea is just to implement class that extends Transformer
withHasInputColwithHasOutputCol (if your transformer 1:1 column
transformer) and has

deftransform(dataset: DataFrame):DataFrame

method.

Thanks,
Peter

On 2015-06-02 20:19, dimple wrote:

Hi,
I would like to embed my own transformer in the Spark.ml Pipleline but do
not see an example of it. Can someone share an example of which
classes/interfaces I need to extend/implement in order to do so. Thanks.

Dimple



--
View this message in 
context:http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail:user-unsubscr...@spark.apache.org
mailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail:user-h...@spark.apache.org 
mailto:user-h...@spark.apache.org








Re: Embedding your own transformer in Spark.ml Pipleline

2015-06-02 Thread Peter Rudenko

Hi Dimple,
take a look to existing transformers:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
(*it's for spark-1.4)

The idea is just to implement class that extends Transformer 
withHasInputColwithHasOutputCol (if your transformer 1:1 column 
transformer) and has


deftransform(dataset: DataFrame):DataFrame

method.

Thanks,
Peter
On 2015-06-02 20:19, dimple wrote:

Hi,
I would like to embed my own transformer in the Spark.ml Pipleline but do
not see an example of it. Can someone share an example of which
classes/interfaces I need to extend/implement in order to do so. Thanks.

Dimple



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org





Re: Embedding your own transformer in Spark.ml Pipleline

2015-06-02 Thread Dimp Bhat
Thanks Peter. Can you share the Tokenizer.java class for Spark 1.2.1.

Dimple

On Tue, Jun 2, 2015 at 10:51 AM, Peter Rudenko petro.rude...@gmail.com
wrote:

  Hi Dimple,
 take a look to existing transformers:

 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala

 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala

 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
 (*it's for spark-1.4)

 The idea is just to implement class that extends Transformer with
 HasInputCol with HasOutputCol (if your transformer 1:1 column
 transformer) and has

 def transform(dataset: DataFrame): DataFrame

 method.

 Thanks,
 Peter
 On 2015-06-02 20:19, dimple wrote:

 Hi,
 I would like to embed my own transformer in the Spark.ml Pipleline but do
 not see an example of it. Can someone share an example of which
 classes/interfaces I need to extend/implement in order to do so. Thanks.

 Dimple



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: Embedding your own transformer in Spark.ml Pipleline

2015-06-02 Thread Dimp Bhat
Thanks for the quick reply Ram.  Will take a look at the Tokenizer code and
try it out.

Dimple

On Tue, Jun 2, 2015 at 10:42 AM, Ram Sriharsha sriharsha@gmail.com
wrote:

 Hi

 We are in the process of adding examples for feature transformations (
 https://issues.apache.org/jira/browse/SPARK-7546) and this should be
 available shortly on Spark Master.
 In the meanwhile, the best place to start would be to look at how the
 Tokenizer works here:

 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala

 You need to implement the Transformer interface as above. In this case a
 UnaryTransformer since the feature transformer acts on one column,
 transforms it and outputs another column.

 and an example of how to build a pipeline that includes a feature
 transformer (the HashingTF is the feature transformer analogous to what you
 would build):

 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/SimpleTextClassificationPipeline.scala

 but stay tuned, we should have examples in Python, Scala and Java soon

 Ram

 On Tue, Jun 2, 2015 at 10:19 AM, dimple dimp201...@gmail.com wrote:

 Hi,
 I would like to embed my own transformer in the Spark.ml Pipleline but do
 not see an example of it. Can someone share an example of which
 classes/interfaces I need to extend/implement in order to do so. Thanks.

 Dimple



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: Embedding your own transformer in Spark.ml Pipleline

2015-06-02 Thread Dimp Bhat
I found this :
https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/ml/feature/Tokenizer.html
which indicates the Tokenizer did exist in Spark 1.2.0 then and not in
1.2.1?

On Tue, Jun 2, 2015 at 12:45 PM, Peter Rudenko petro.rude...@gmail.com
wrote:

  I'm afraid there's no such class for 1.2.1. This API was added to 1.3.0
 AFAIK.


 On 2015-06-02 21:40, Dimp Bhat wrote:

 Thanks Peter. Can you share the Tokenizer.java class for Spark 1.2.1.

  Dimple

 On Tue, Jun 2, 2015 at 10:51 AM, Peter Rudenko petro.rude...@gmail.com
 wrote:

  Hi Dimple,
 take a look to existing transformers:

 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala

 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala

 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala
 (*it's for spark-1.4)

 The idea is just to implement class that extends Transformer with
 HasInputCol with HasOutputCol (if your transformer 1:1 column
 transformer) and has

 def transform(dataset: DataFrame): DataFrame

 method.

 Thanks,
 Peter
 On 2015-06-02 20:19, dimple wrote:

 Hi,
 I would like to embed my own transformer in the Spark.ml Pipleline but do
 not see an example of it. Can someone share an example of which
 classes/interfaces I need to extend/implement in order to do so. Thanks.

 Dimple



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org







Re: Embedding your own transformer in Spark.ml Pipleline

2015-06-02 Thread Ram Sriharsha
Hi

We are in the process of adding examples for feature transformations (
https://issues.apache.org/jira/browse/SPARK-7546) and this should be
available shortly on Spark Master.
In the meanwhile, the best place to start would be to look at how the
Tokenizer works here:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala

You need to implement the Transformer interface as above. In this case a
UnaryTransformer since the feature transformer acts on one column,
transforms it and outputs another column.

and an example of how to build a pipeline that includes a feature
transformer (the HashingTF is the feature transformer analogous to what you
would build):
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/SimpleTextClassificationPipeline.scala

but stay tuned, we should have examples in Python, Scala and Java soon

Ram

On Tue, Jun 2, 2015 at 10:19 AM, dimple dimp201...@gmail.com wrote:

 Hi,
 I would like to embed my own transformer in the Spark.ml Pipleline but do
 not see an example of it. Can someone share an example of which
 classes/interfaces I need to extend/implement in order to do so. Thanks.

 Dimple



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org