[ https://issues.apache.org/jira/browse/SPARK-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-12750. ------------------------------- Resolution: Not A Problem The problem is as it says. You've written Java code such that your Function retains a reference to a non-serializable object. > Java class method don't work properly > ------------------------------------- > > Key: SPARK-12750 > URL: https://issues.apache.org/jira/browse/SPARK-12750 > Project: Spark > Issue Type: Question > Reporter: Gramce > > I use java spark to tansform the labeledpoint. > I want to select several columns from the JavaRdd<labeledPoint>. For example > the first three colunmns. > So I wrote like this: > int[] ad={1,2,3}; > int b=ad.length; > JavaRDD<LabeledPoint> ggd=parsedData.map( > new Function<LabeledPoint, LabeledPoint>(){ > public LabeledPoint call(LabeledPoint a){ > double[] v =new double[b]; > for(int i=0;i<b;i++){ > > v[i]=a.features().toArray()[ad[i]]; > } > return new > LabeledPoint(a.label(),Vectors.dense(v)); > } > }); > where parsedData is a LabeledPoint data. > Now I want to converse this to a method. So the code is like this: > class myrddd{ > public JavaRDD<LabeledPoint> abcd; > public myrddd(JavaRDD<LabeledPoint> deff ){ > abcd=deff; > } > public JavaRDD<LabeledPoint> abcdf(int[]asdf,int b){ > JavaRDD<LabeledPoint> bcd=abcd; > JavaRDD<LabeledPoint> mms=bcd.map( > new Function<LabeledPoint, LabeledPoint>(){ > public LabeledPoint call(LabeledPoint a){ > double[] v =new double[b]; > for(int i=0;i<b;i++){ > > v[i]=a.features().toArray()[asdf[i]]; > } > return new > LabeledPoint(a.label(),Vectors.dense(v)); > } > }); > return(mms);} > } > And > myrddd ndfs=new myrddd(parsedData); > JavaRDD<LabeledPoint> ggdf=ndfs.abcdf(ad, b); > But this doesn't work.Following is the error: > Exception in thread "main" org.apache.spark.SparkException: Task not > serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2032) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:318) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:317) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:310) > at org.apache.spark.rdd.RDD.map(RDD.scala:317) > at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:93) > at > org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:47) > at anbv.qwe.myrddd.abcdf(dfa.java:53) > at anbv.qwe.dfa.main(dfa.java:42) > Caused by: java.io.NotSerializableException: anbv.qwe.myrddd > Serialization stack: > - object not serializable (class: anbv.qwe.myrddd, value: > anbv.qwe.myrddd@310aee0b) > - field (class: anbv.qwe.myrddd$1, name: this$0, type: class > anbv.qwe.myrddd) > - object (class anbv.qwe.myrddd$1, anbv.qwe.myrddd$1@4b76aa5a) > - field (class: > org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: > fun$1, type: interface org.apache.spark.api.java.function.Function) > - object (class > org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>) > at > org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301) > ... 13 more > but this -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org