Re: How to save spark-ML model in Java?

2017-01-19 Thread Xiaomeng Wan
cv.fit is going to give you a CrossValidatorModel, if you want to extract
the real model built. You need to do

val cvModel = cv.fit(data)

val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel]

val model = plmodel.stages(2).asInstanceOf[whatever_model]

then you can model.save

On 19 January 2017 at 11:31, Minudika Malshan  wrote:

> Hi,
>
> Thanks Rezaul and Asher Krim.
>
> The method suggested by Rezaul works fine for NaiveBayes but still fails
> for RandomForest and Multi-layer perceptron classifier.
> Everything properly is saved until this stage.
>
> CrossValidator cv = new CrossValidator()
> .setEstimator(pipeline)
> .setEvaluator(evaluator)
> .setEstimatorParamMaps(paramGrid)
> .setNumFolds(folds);
>
> Any idea on how to resolve this?
>
>
>
>
>
> On Thu, Jan 12, 2017 at 9:13 PM, Asher Krim  wrote:
>
>> What version of Spark are you on?
>> Although it's cut off, I think your error is with RandomForestClassifier,
>> is that correct? If so, you should upgrade to spark 2 since I think this
>> class only became writeable/readable in Spark 2 (
>> https://github.com/apache/spark/pull/12118)
>>
>> On Thu, Jan 12, 2017 at 8:43 AM, Md. Rezaul Karim <
>> rezaul.ka...@insight-centre.org> wrote:
>>
>>> Hi Malshan,
>>>
>>> The error says that one (or more) of the estimators/stages is either not
>>> writable or compatible that supports overwrite/model write operation.
>>>
>>> Suppose you want to configure an ML pipeline consisting of three stages
>>> (i.e. estimator): tokenizer, hashingTF, and nb:
>>> val nb = new NaiveBayes().setSmoothing(0.1)
>>> val tokenizer = new Tokenizer().setInputCol("label
>>> ").setOutputCol("label")
>>> val hashingTF = new HashingTF().setInputCol(tokeni
>>> zer.getOutputCol).setOutputCol("features")
>>> val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF,
>>> nb))
>>>
>>>
>>> Now check if all the stages are writable. And to make it ease try saving
>>> stages individually:  -e.g. tokenizer.write.save("path")
>>>
>>>
>>> hashingTF.write.save("path")
>>> After that suppose you want to perform a 10-fold cross-validation as
>>> follows:
>>> val cv = new CrossValidator()
>>>   .setEstimator(pipeline)
>>>   .setEvaluator(new BinaryClassificationEvaluator)
>>>   .setEstimatorParamMaps(paramGrid)
>>>   .setNumFolds(10)
>>>
>>> Where:
>>> val paramGrid = new ParamGridBuilder()
>>> .addGrid(hashingTF.numFeatures, Array(10,
>>> 100, 1000))
>>> .addGrid(nb.smoothing, Array(0.001, 0.0001))
>>> .build()
>>>
>>> Now the model that you trained using the training set should be writable
>>> if all of the stages are okay:
>>> val model = cv.fit(trainingData)
>>> model.write.overwrite().save("output/NBModel")
>>>
>>>
>>>
>>> Hope that helps.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>> _
>>> *Md. Rezaul Karim*, BSc, MSc
>>> PhD Researcher, INSIGHT Centre for Data Analytics
>>> National University of Ireland, Galway
>>> IDA Business Park, Dangan, Galway, Ireland
>>> Web: http://www.reza-analytics.eu/index.html
>>> 
>>>
>>> On 12 January 2017 at 09:09, Minudika Malshan 
>>> wrote:
>>>
 Hi,

 When I try to save a pipeline model using spark ML (Java) , the
 following exception is thrown.


 java.lang.UnsupportedOperationException: Pipeline write will fail on
 this Pipeline because it contains a stage which does not implement
 Writable. Non-Writable stage: rfc_98f8c9e0bd04 of type class
 org.apache.spark.ml.classification.Rand


 Here is my code segment.


 model.write().overwrite,save


 model.write().overwrite().save("path
 model.write().overwrite().save("mypath");


 How to resolve this?

 Thanks and regards!

 Minudika


>>>
>>
>>
>> --
>> Asher Krim
>> Senior Software Engineer
>>
>
>
>
> --
> *Minudika Malshan*
> Undergraduate
> Department of Computer Science and Engineering
> University of Moratuwa
> Sri Lanka.
> 
>
>
>


Re: How to save spark-ML model in Java?

2017-01-19 Thread Minudika Malshan
Hi,

Thanks Rezaul and Asher Krim.

The method suggested by Rezaul works fine for NaiveBayes but still fails
for RandomForest and Multi-layer perceptron classifier.
Everything properly is saved until this stage.

CrossValidator cv = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(folds);

Any idea on how to resolve this?





On Thu, Jan 12, 2017 at 9:13 PM, Asher Krim  wrote:

> What version of Spark are you on?
> Although it's cut off, I think your error is with RandomForestClassifier,
> is that correct? If so, you should upgrade to spark 2 since I think this
> class only became writeable/readable in Spark 2 (
> https://github.com/apache/spark/pull/12118)
>
> On Thu, Jan 12, 2017 at 8:43 AM, Md. Rezaul Karim <
> rezaul.ka...@insight-centre.org> wrote:
>
>> Hi Malshan,
>>
>> The error says that one (or more) of the estimators/stages is either not
>> writable or compatible that supports overwrite/model write operation.
>>
>> Suppose you want to configure an ML pipeline consisting of three stages
>> (i.e. estimator): tokenizer, hashingTF, and nb:
>> val nb = new NaiveBayes().setSmoothing(0.1)
>> val tokenizer = new Tokenizer().setInputCol("label
>> ").setOutputCol("label")
>> val hashingTF = new HashingTF().setInputCol(tokeni
>> zer.getOutputCol).setOutputCol("features")
>> val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF,
>> nb))
>>
>>
>> Now check if all the stages are writable. And to make it ease try saving
>> stages individually:  -e.g. tokenizer.write.save("path")
>>
>>
>> hashingTF.write.save("path")
>> After that suppose you want to perform a 10-fold cross-validation as
>> follows:
>> val cv = new CrossValidator()
>>   .setEstimator(pipeline)
>>   .setEvaluator(new BinaryClassificationEvaluator)
>>   .setEstimatorParamMaps(paramGrid)
>>   .setNumFolds(10)
>>
>> Where:
>> val paramGrid = new ParamGridBuilder()
>> .addGrid(hashingTF.numFeatures, Array(10,
>> 100, 1000))
>> .addGrid(nb.smoothing, Array(0.001, 0.0001))
>> .build()
>>
>> Now the model that you trained using the training set should be writable
>> if all of the stages are okay:
>> val model = cv.fit(trainingData)
>> model.write.overwrite().save("output/NBModel")
>>
>>
>>
>> Hope that helps.
>>
>>
>>
>>
>>
>>
>>
>> Regards,
>> _
>> *Md. Rezaul Karim*, BSc, MSc
>> PhD Researcher, INSIGHT Centre for Data Analytics
>> National University of Ireland, Galway
>> IDA Business Park, Dangan, Galway, Ireland
>> Web: http://www.reza-analytics.eu/index.html
>> 
>>
>> On 12 January 2017 at 09:09, Minudika Malshan 
>> wrote:
>>
>>> Hi,
>>>
>>> When I try to save a pipeline model using spark ML (Java) , the
>>> following exception is thrown.
>>>
>>>
>>> java.lang.UnsupportedOperationException: Pipeline write will fail on
>>> this Pipeline because it contains a stage which does not implement
>>> Writable. Non-Writable stage: rfc_98f8c9e0bd04 of type class
>>> org.apache.spark.ml.classification.Rand
>>>
>>>
>>> Here is my code segment.
>>>
>>>
>>> model.write().overwrite,save
>>>
>>>
>>> model.write().overwrite().save("path
>>> model.write().overwrite().save("mypath");
>>>
>>>
>>> How to resolve this?
>>>
>>> Thanks and regards!
>>>
>>> Minudika
>>>
>>>
>>
>
>
> --
> Asher Krim
> Senior Software Engineer
>



-- 
*Minudika Malshan*
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.



Re: How to save spark-ML model in Java?

2017-01-12 Thread Asher Krim
What version of Spark are you on?
Although it's cut off, I think your error is with RandomForestClassifier,
is that correct? If so, you should upgrade to spark 2 since I think this
class only became writeable/readable in Spark 2 (
https://github.com/apache/spark/pull/12118)

On Thu, Jan 12, 2017 at 8:43 AM, Md. Rezaul Karim <
rezaul.ka...@insight-centre.org> wrote:

> Hi Malshan,
>
> The error says that one (or more) of the estimators/stages is either not
> writable or compatible that supports overwrite/model write operation.
>
> Suppose you want to configure an ML pipeline consisting of three stages
> (i.e. estimator): tokenizer, hashingTF, and nb:
> val nb = new NaiveBayes().setSmoothing(0.1)
> val tokenizer = new Tokenizer().setInputCol("
> label").setOutputCol("label")
> val hashingTF = new 
> HashingTF().setInputCol(tokenizer.getOutputCol).setOutputCol("features")
>
> val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF,
> nb))
>
>
> Now check if all the stages are writable. And to make it ease try saving
> stages individually:  -e.g. tokenizer.write.save("path")
>
>
> hashingTF.write.save("path")
> After that suppose you want to perform a 10-fold cross-validation as
> follows:
> val cv = new CrossValidator()
>   .setEstimator(pipeline)
>   .setEvaluator(new BinaryClassificationEvaluator)
>   .setEstimatorParamMaps(paramGrid)
>   .setNumFolds(10)
>
> Where:
> val paramGrid = new ParamGridBuilder()
> .addGrid(hashingTF.numFeatures, Array(10,
> 100, 1000))
> .addGrid(nb.smoothing, Array(0.001, 0.0001))
> .build()
>
> Now the model that you trained using the training set should be writable
> if all of the stages are okay:
> val model = cv.fit(trainingData)
> model.write.overwrite().save("output/NBModel")
>
>
>
> Hope that helps.
>
>
>
>
>
>
>
> Regards,
> _
> *Md. Rezaul Karim*, BSc, MSc
> PhD Researcher, INSIGHT Centre for Data Analytics
> National University of Ireland, Galway
> IDA Business Park, Dangan, Galway, Ireland
> Web: http://www.reza-analytics.eu/index.html
> 
>
> On 12 January 2017 at 09:09, Minudika Malshan 
> wrote:
>
>> Hi,
>>
>> When I try to save a pipeline model using spark ML (Java) , the following
>> exception is thrown.
>>
>>
>> java.lang.UnsupportedOperationException: Pipeline write will fail on
>> this Pipeline because it contains a stage which does not implement
>> Writable. Non-Writable stage: rfc_98f8c9e0bd04 of type class
>> org.apache.spark.ml.classification.Rand
>>
>>
>> Here is my code segment.
>>
>>
>> model.write().overwrite,save
>>
>>
>> model.write().overwrite().save("path
>> model.write().overwrite().save("mypath");
>>
>>
>> How to resolve this?
>>
>> Thanks and regards!
>>
>> Minudika
>>
>>
>


-- 
Asher Krim
Senior Software Engineer


Re: How to save spark-ML model in Java?

2017-01-12 Thread Md. Rezaul Karim
Hi Malshan,

The error says that one (or more) of the estimators/stages is either not
writable or compatible that supports overwrite/model write operation.

Suppose you want to configure an ML pipeline consisting of three stages
(i.e. estimator): tokenizer, hashingTF, and nb:
val nb = new NaiveBayes().setSmoothing(0.1)
val tokenizer = new
Tokenizer().setInputCol("label").setOutputCol("label")
val hashingTF = new
HashingTF().setInputCol(tokenizer.getOutputCol).setOutputCol("features")

val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF,
nb))


Now check if all the stages are writable. And to make it ease try saving
stages individually:  -e.g. tokenizer.write.save("path")

hashingTF.write.save("path")
After that suppose you want to perform a 10-fold cross-validation as
follows:
val cv = new CrossValidator()
  .setEstimator(pipeline)
  .setEvaluator(new BinaryClassificationEvaluator)
  .setEstimatorParamMaps(paramGrid)
  .setNumFolds(10)

Where:
val paramGrid = new ParamGridBuilder()
.addGrid(hashingTF.numFeatures, Array(10, 100,
1000))
.addGrid(nb.smoothing, Array(0.001, 0.0001))
.build()

Now the model that you trained using the training set should be writable if
all of the stages are okay:
val model = cv.fit(trainingData)
model.write.overwrite().save("output/NBModel")



Hope that helps.







Regards,
_
*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html


On 12 January 2017 at 09:09, Minudika Malshan  wrote:

> Hi,
>
> When I try to save a pipeline model using spark ML (Java) , the following
> exception is thrown.
>
>
> java.lang.UnsupportedOperationException: Pipeline write will fail on this
> Pipeline because it contains a stage which does not implement Writable.
> Non-Writable stage: rfc_98f8c9e0bd04 of type class org.apache.spark.ml.
> classification.Rand
>
>
> Here is my code segment.
>
>
> model.write().overwrite,save
>
>
> model.write().overwrite().save("path
> model.write().overwrite().save("mypath");
>
>
> How to resolve this?
>
> Thanks and regards!
>
> Minudika
>
>