haohao0103 opened a new issue, #467: URL: https://github.com/apache/incubator-hugegraph-toolchain/issues/467
### Bug Type (问题类型) None ### Before submit - [X] I had searched in the [issues](https://github.com/apache/hugegraph-toolchain/issues) and found no similar issues. ### Environment (环境信息) - Server Version: v1.0.0 - Toolchain Version: v1.0.0 ### Expected & Actual behavior (期望与实际表现) error message: ERROR Printer: Failed to start loading, cause: org.apache.spark.SparkException: Task not serializable java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Task not serializable at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hugegraph.loader.spark.HugeGraphSparkLoader.load(HugeGraphSparkLoader.java:193) at org.apache.hugegraph.loader.spark.HugeGraphSparkLoader.main(HugeGraphSparkLoader.java:88) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:966) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:191) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:214) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1054) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1063) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) at org.apache.spark.SparkContext.clean(SparkContext.scala:2487) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$1(RDD.scala:1019) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:1018) at org.apache.spark.sql.Dataset.$anonfun$foreachPartition$1(Dataset.scala:2912) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.Dataset.$anonfun$withNewRDDExecutionId$1(Dataset.scala:3695) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3693) at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2912) at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2923) at org.apache.hugegraph.loader.spark.HugeGraphSparkLoader.lambda$load$1(HugeGraphSparkLoader.java:171) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.NotSerializableException: java.util.concurrent.ThreadPoolExecutor Serialization stack: - object not serializable (class: java.util.concurrent.ThreadPoolExecutor, value: java.util.concurrent.ThreadPoolExecutor@65daf1e0[Running, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 0]) - field (class: org.apache.hugegraph.loader.spark.HugeGraphSparkLoader, name: executor, type: interface java.util.concurrent.ExecutorService) - object (class org.apache.hugegraph.loader.spark.HugeGraphSparkLoader, org.apache.hugegraph.loader.spark.HugeGraphSparkLoader@3a21317) - element of array (index: 0) - array (class [Ljava.lang.Object;, size 2) - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;) - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.apache.hugegraph.loader.spark.HugeGraphSparkLoader, functionalInterfaceMethod=org/apache/spark/api/java/function/ForeachPartitionFunction.call:(Ljava/util/Iterator;)V, implementation=invokeSpecial org/apache/hugegraph/loader/spark/HugeGraphSparkLoader.lambda$null$18e75a97$1:(Lorg/apache/hugegraph/loader/mapping/InputStruct;Ljava/util/Iterator;)V, instantiatedMethodType=(Ljava/util/Iterator;)V, numCaptured=2]); CMD : sh bin/hugegraph-spark-loader.sh --master local --name spark-hugegraph-loader --file example/spark/struct.json --username admin --token admin --host 127.0.0.1 --port 8080 --graph graph-test HugeGraphSparkLoader implement Serializable, but Its ExecutorService executor property does not implement Serializable; So that's the problem. I tried to use transient to identify the property, and it worked ### Vertex/Edge example (问题点 / 边数据举例) _No response_ ### Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构) _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
