Re: Embedding your own transformer in Spark.ml Pipleline
Hi Brandon, they are available, but private to ml package. They are now public in 1.4. For 1.3.1 you can define your transformer in org.apache.spark.ml package - then you could use these traits. Thanks, Peter Rudenko On 2015-06-04 20:28, Brandon Plaster wrote: Is HasInputCol and HasOutputCol available in 1.3.1? I'm getting the following message when I'm trying to implement a Transformer and importing org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}: error: object shared is not a member of package org.apache.spark.ml.param and error: trait HasInputCol in package param cannot be accessed in package org.apache.spark.ml.param On Tue, Jun 2, 2015 at 1:51 PM, Peter Rudenko petro.rude...@gmail.com mailto:petro.rude...@gmail.com wrote: Hi Dimple, take a look to existing transformers: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala (*it's for spark-1.4) The idea is just to implement class that extends Transformer withHasInputColwithHasOutputCol (if your transformer 1:1 column transformer) and has deftransform(dataset: DataFrame):DataFrame method. Thanks, Peter On 2015-06-02 20:19, dimple wrote: Hi, I would like to embed my own transformer in the Spark.ml Pipleline but do not see an example of it. Can someone share an example of which classes/interfaces I need to extend/implement in order to do so. Thanks. Dimple -- View this message in context:http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail:user-unsubscr...@spark.apache.org mailto:user-unsubscr...@spark.apache.org For additional commands, e-mail:user-h...@spark.apache.org mailto:user-h...@spark.apache.org
Re: Embedding your own transformer in Spark.ml Pipleline
Hi Dimple, take a look to existing transformers: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala (*it's for spark-1.4) The idea is just to implement class that extends Transformer withHasInputColwithHasOutputCol (if your transformer 1:1 column transformer) and has deftransform(dataset: DataFrame):DataFrame method. Thanks, Peter On 2015-06-02 20:19, dimple wrote: Hi, I would like to embed my own transformer in the Spark.ml Pipleline but do not see an example of it. Can someone share an example of which classes/interfaces I need to extend/implement in order to do so. Thanks. Dimple -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Embedding your own transformer in Spark.ml Pipleline
Thanks Peter. Can you share the Tokenizer.java class for Spark 1.2.1. Dimple On Tue, Jun 2, 2015 at 10:51 AM, Peter Rudenko petro.rude...@gmail.com wrote: Hi Dimple, take a look to existing transformers: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala (*it's for spark-1.4) The idea is just to implement class that extends Transformer with HasInputCol with HasOutputCol (if your transformer 1:1 column transformer) and has def transform(dataset: DataFrame): DataFrame method. Thanks, Peter On 2015-06-02 20:19, dimple wrote: Hi, I would like to embed my own transformer in the Spark.ml Pipleline but do not see an example of it. Can someone share an example of which classes/interfaces I need to extend/implement in order to do so. Thanks. Dimple -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Embedding your own transformer in Spark.ml Pipleline
Thanks for the quick reply Ram. Will take a look at the Tokenizer code and try it out. Dimple On Tue, Jun 2, 2015 at 10:42 AM, Ram Sriharsha sriharsha@gmail.com wrote: Hi We are in the process of adding examples for feature transformations ( https://issues.apache.org/jira/browse/SPARK-7546) and this should be available shortly on Spark Master. In the meanwhile, the best place to start would be to look at how the Tokenizer works here: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala You need to implement the Transformer interface as above. In this case a UnaryTransformer since the feature transformer acts on one column, transforms it and outputs another column. and an example of how to build a pipeline that includes a feature transformer (the HashingTF is the feature transformer analogous to what you would build): https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/SimpleTextClassificationPipeline.scala but stay tuned, we should have examples in Python, Scala and Java soon Ram On Tue, Jun 2, 2015 at 10:19 AM, dimple dimp201...@gmail.com wrote: Hi, I would like to embed my own transformer in the Spark.ml Pipleline but do not see an example of it. Can someone share an example of which classes/interfaces I need to extend/implement in order to do so. Thanks. Dimple -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Embedding your own transformer in Spark.ml Pipleline
I found this : https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/ml/feature/Tokenizer.html which indicates the Tokenizer did exist in Spark 1.2.0 then and not in 1.2.1? On Tue, Jun 2, 2015 at 12:45 PM, Peter Rudenko petro.rude...@gmail.com wrote: I'm afraid there's no such class for 1.2.1. This API was added to 1.3.0 AFAIK. On 2015-06-02 21:40, Dimp Bhat wrote: Thanks Peter. Can you share the Tokenizer.java class for Spark 1.2.1. Dimple On Tue, Jun 2, 2015 at 10:51 AM, Peter Rudenko petro.rude...@gmail.com wrote: Hi Dimple, take a look to existing transformers: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala (*it's for spark-1.4) The idea is just to implement class that extends Transformer with HasInputCol with HasOutputCol (if your transformer 1:1 column transformer) and has def transform(dataset: DataFrame): DataFrame method. Thanks, Peter On 2015-06-02 20:19, dimple wrote: Hi, I would like to embed my own transformer in the Spark.ml Pipleline but do not see an example of it. Can someone share an example of which classes/interfaces I need to extend/implement in order to do so. Thanks. Dimple -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Embedding your own transformer in Spark.ml Pipleline
Hi We are in the process of adding examples for feature transformations ( https://issues.apache.org/jira/browse/SPARK-7546) and this should be available shortly on Spark Master. In the meanwhile, the best place to start would be to look at how the Tokenizer works here: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala You need to implement the Transformer interface as above. In this case a UnaryTransformer since the feature transformer acts on one column, transforms it and outputs another column. and an example of how to build a pipeline that includes a feature transformer (the HashingTF is the feature transformer analogous to what you would build): https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/SimpleTextClassificationPipeline.scala but stay tuned, we should have examples in Python, Scala and Java soon Ram On Tue, Jun 2, 2015 at 10:19 AM, dimple dimp201...@gmail.com wrote: Hi, I would like to embed my own transformer in the Spark.ml Pipleline but do not see an example of it. Can someone share an example of which classes/interfaces I need to extend/implement in order to do so. Thanks. Dimple -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org