Structured Streaming and Hive

2017-09-29 Thread HanPan
Hi guys,

 

 I'm new to spark structured streaming. I'm using 2.1.0 and my scenario
is reading specific topic from kafka and do some data mining tasks, then
save the result dataset to hive.

 While writing data to hive, somehow it seems like not supported yet and
I tried this:



   It runs ok, but no result in hive.

 

   Any idea writing the stream result to hive?

 

Thanks

Pan

 

 



答复: MLPC model can not be saved

2016-03-24 Thread HanPan
Hi Alexander,

 

 Thanks for your reply. The pull request shows that
MultilayerPerceptronClassifier implement default params writable interface.
I will try that.

 

Thanks

Pan

 

发件人: Ulanov, Alexander [mailto:alexander.ula...@hpe.com] 
发送时间: 2016年3月22日 1:38
收件人: HanPan; dev@spark.apache.org
主题: RE: MLPC model can not be saved

 

Hi Pan,

 

There is a pull request that is supposed to fix the issue:

https://github.com/apache/spark/pull/9854

 

There is a workaround for saving/loading a model (however I am not sure if
it will work for the pipeline): 

sc.parallelize(Seq(model), 1).saveAsObjectFile("path")

val sameModel = sc.objectFile[YourCLASS]("path").first()

 

 

Best regards, Alexander

 

From: HanPan [mailto:pa...@thinkingdata.cn] 
Sent: Sunday, March 20, 2016 8:32 PM
To: dev@spark.apache.org <mailto:dev@spark.apache.org> 
Cc: pa...@thinkingdata.cn <mailto:pa...@thinkingdata.cn> 
Subject: MLPC model can not be saved

 

 

Hi Guys,

 

 I built a ML pipeline that includes multilayer perceptron
classifier, I got the following error message when I tried to save the
pipeline model. It seems like MLPC model can not be saved which means I have
no ways to save the trained model. Is there any way to save the model that I
can use it for future prediction.

 

 Exception in thread "main" java.lang.UnsupportedOperationException:
Pipeline write will fail on this Pipeline because it contains a stage which
does not implement Writable. Non-Writable stage: mlpc_2d8b74f6da60 of type
class
org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel

 at
org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply
(Pipeline.scala:218)

 at
org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply
(Pipeline.scala:215)

 at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala
:33)

 at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

 at
org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:
215)

 at
org.apache.spark.ml.PipelineModel$PipelineModelWriter.(Pipeline.scala:
325)

 at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:309)

 at
org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:130)

 at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:280)

 at
cn.thinkingdata.nlp.spamclassifier.FFNNSpamClassifierPipeLine.main(FFNNSpamC
lassifierPipeLine.java:76)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62
)

 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)

 at java.lang.reflect.Method.invoke(Method.java:497)

 at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$ru
nMain(SparkSubmit.scala:731)

 at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

 at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 

Thanks

Pan



MLPC model can not be saved

2016-03-20 Thread HanPan
 

Hi Guys,

 

 I built a ML pipeline that includes multilayer perceptron
classifier, I got the following error message when I tried to save the
pipeline model. It seems like MLPC model can not be saved which means I have
no ways to save the trained model. Is there any way to save the model that I
can use it for future prediction.

 

 Exception in thread "main" java.lang.UnsupportedOperationException:
Pipeline write will fail on this Pipeline because it contains a stage which
does not implement Writable. Non-Writable stage: mlpc_2d8b74f6da60 of type
class
org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel

 at
org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply
(Pipeline.scala:218)

 at
org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply
(Pipeline.scala:215)

 at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala
:33)

 at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

 at
org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:
215)

 at
org.apache.spark.ml.PipelineModel$PipelineModelWriter.(Pipeline.scala:
325)

 at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:309)

 at
org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:130)

 at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:280)

 at
cn.thinkingdata.nlp.spamclassifier.FFNNSpamClassifierPipeLine.main(FFNNSpamC
lassifierPipeLine.java:76)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62
)

 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)

 at java.lang.reflect.Method.invoke(Method.java:497)

 at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$ru
nMain(SparkSubmit.scala:731)

 at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

 at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 

Thanks

Pan