Re: How to add custom steps to Pipeline models?
Hi, If it's Python I can't help. I'm with Scala. Jacek On 14 Aug 2016 9:27 p.m., "Evan Zamir"wrote: > Thanks, but I should have been more clear that I'm trying to do this in > PySpark, not Scala. Using an example I found on SO, I was able to implement > a Pipeline step in Python, but it seems it is more difficult (perhaps > currently impossible) to make it persist to disk (I tried implementing > _to_java method to no avail). Any ideas about that? > > On Sun, Aug 14, 2016 at 6:02 PM Jacek Laskowski wrote: > >> Hi, >> >> It should just work if you followed the Transformer interface [1]. >> When you have the transformers, creating a Pipeline is a matter of >> setting them as additional stages (using Pipeline.setStages [2]). >> >> [1] https://github.com/apache/spark/blob/master/mllib/src/ >> main/scala/org/apache/spark/ml/Transformer.scala >> [2] https://github.com/apache/spark/blob/master/mllib/src/ >> main/scala/org/apache/spark/ml/Pipeline.scala#L107 >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >> On Fri, Aug 12, 2016 at 9:19 AM, evanzamir wrote: >> > I'm building an LDA Pipeline, currently with 4 steps, Tokenizer, >> > StopWordsRemover, CountVectorizer, and LDA. I would like to add more >> steps, >> > for example, stemming and lemmatization, and also 1-gram and 2-grams >> (which >> > I believe is not supported by the default NGram class). Is there a way >> to >> > add these steps? In sklearn, you can create classes with fit() and >> > transform() methods, and that should be enough. Is that true in Spark >> ML as >> > well (or something similar)? >> > >> > >> > >> > -- >> > View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/How-to-add-custom-steps-to-Pipeline- >> models-tp27522.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > - >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > >> >
Re: How to add custom steps to Pipeline models?
Thanks, but I should have been more clear that I'm trying to do this in PySpark, not Scala. Using an example I found on SO, I was able to implement a Pipeline step in Python, but it seems it is more difficult (perhaps currently impossible) to make it persist to disk (I tried implementing _to_java method to no avail). Any ideas about that? On Sun, Aug 14, 2016 at 6:02 PM Jacek Laskowskiwrote: > Hi, > > It should just work if you followed the Transformer interface [1]. > When you have the transformers, creating a Pipeline is a matter of > setting them as additional stages (using Pipeline.setStages [2]). > > [1] > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Transformer.scala > [2] > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala#L107 > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Aug 12, 2016 at 9:19 AM, evanzamir wrote: > > I'm building an LDA Pipeline, currently with 4 steps, Tokenizer, > > StopWordsRemover, CountVectorizer, and LDA. I would like to add more > steps, > > for example, stemming and lemmatization, and also 1-gram and 2-grams > (which > > I believe is not supported by the default NGram class). Is there a way to > > add these steps? In sklearn, you can create classes with fit() and > > transform() methods, and that should be enough. Is that true in Spark ML > as > > well (or something similar)? > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-custom-steps-to-Pipeline-models-tp27522.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > >
Re: How to add custom steps to Pipeline models?
Hi, It should just work if you followed the Transformer interface [1]. When you have the transformers, creating a Pipeline is a matter of setting them as additional stages (using Pipeline.setStages [2]). [1] https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Transformer.scala [2] https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala#L107 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Aug 12, 2016 at 9:19 AM, evanzamirwrote: > I'm building an LDA Pipeline, currently with 4 steps, Tokenizer, > StopWordsRemover, CountVectorizer, and LDA. I would like to add more steps, > for example, stemming and lemmatization, and also 1-gram and 2-grams (which > I believe is not supported by the default NGram class). Is there a way to > add these steps? In sklearn, you can create classes with fit() and > transform() methods, and that should be enough. Is that true in Spark ML as > well (or something similar)? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-custom-steps-to-Pipeline-models-tp27522.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org