Re: How to add custom steps to Pipeline models?

2016-08-14 Thread Jacek Laskowski
Hi,

If it's Python I can't help. I'm with Scala.

Jacek

On 14 Aug 2016 9:27 p.m., "Evan Zamir"  wrote:

> Thanks, but I should have been more clear that I'm trying to do this in
> PySpark, not Scala. Using an example I found on SO, I was able to implement
> a Pipeline step in Python, but it seems it is more difficult (perhaps
> currently impossible) to make it persist to disk (I tried implementing
> _to_java method to no avail). Any ideas about that?
>
> On Sun, Aug 14, 2016 at 6:02 PM Jacek Laskowski  wrote:
>
>> Hi,
>>
>> It should just work if you followed the Transformer interface [1].
>> When you have the transformers, creating a Pipeline is a matter of
>> setting them as additional stages (using Pipeline.setStages [2]).
>>
>> [1] https://github.com/apache/spark/blob/master/mllib/src/
>> main/scala/org/apache/spark/ml/Transformer.scala
>> [2] https://github.com/apache/spark/blob/master/mllib/src/
>> main/scala/org/apache/spark/ml/Pipeline.scala#L107
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Fri, Aug 12, 2016 at 9:19 AM, evanzamir  wrote:
>> > I'm building an LDA Pipeline, currently with 4 steps, Tokenizer,
>> > StopWordsRemover, CountVectorizer, and LDA. I would like to add more
>> steps,
>> > for example, stemming and lemmatization, and also 1-gram and 2-grams
>> (which
>> > I believe is not supported by the default NGram class). Is there a way
>> to
>> > add these steps? In sklearn, you can create classes with fit() and
>> > transform() methods, and that should be enough. Is that true in Spark
>> ML as
>> > well (or something similar)?
>> >
>> >
>> >
>> > --
>> > View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/How-to-add-custom-steps-to-Pipeline-
>> models-tp27522.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > -
>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >
>>
>


Re: How to add custom steps to Pipeline models?

2016-08-14 Thread Evan Zamir
Thanks, but I should have been more clear that I'm trying to do this in
PySpark, not Scala. Using an example I found on SO, I was able to implement
a Pipeline step in Python, but it seems it is more difficult (perhaps
currently impossible) to make it persist to disk (I tried implementing
_to_java method to no avail). Any ideas about that?

On Sun, Aug 14, 2016 at 6:02 PM Jacek Laskowski  wrote:

> Hi,
>
> It should just work if you followed the Transformer interface [1].
> When you have the transformers, creating a Pipeline is a matter of
> setting them as additional stages (using Pipeline.setStages [2]).
>
> [1]
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Transformer.scala
> [2]
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala#L107
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Aug 12, 2016 at 9:19 AM, evanzamir  wrote:
> > I'm building an LDA Pipeline, currently with 4 steps, Tokenizer,
> > StopWordsRemover, CountVectorizer, and LDA. I would like to add more
> steps,
> > for example, stemming and lemmatization, and also 1-gram and 2-grams
> (which
> > I believe is not supported by the default NGram class). Is there a way to
> > add these steps? In sklearn, you can create classes with fit() and
> > transform() methods, and that should be enough. Is that true in Spark ML
> as
> > well (or something similar)?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-custom-steps-to-Pipeline-models-tp27522.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>


Re: How to add custom steps to Pipeline models?

2016-08-14 Thread Jacek Laskowski
Hi,

It should just work if you followed the Transformer interface [1].
When you have the transformers, creating a Pipeline is a matter of
setting them as additional stages (using Pipeline.setStages [2]).

[1] 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Transformer.scala
[2] 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala#L107

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Aug 12, 2016 at 9:19 AM, evanzamir  wrote:
> I'm building an LDA Pipeline, currently with 4 steps, Tokenizer,
> StopWordsRemover, CountVectorizer, and LDA. I would like to add more steps,
> for example, stemming and lemmatization, and also 1-gram and 2-grams (which
> I believe is not supported by the default NGram class). Is there a way to
> add these steps? In sklearn, you can create classes with fit() and
> transform() methods, and that should be enough. Is that true in Spark ML as
> well (or something similar)?
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-custom-steps-to-Pipeline-models-tp27522.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org