[ https://issues.apache.org/jira/browse/HUDI-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dave Hagman updated HUDI-2230: ------------------------------ Description: Steps to reproduce: * Enable graphite metrics via props file. Example: {noformat} hoodie.metrics.on=true hoodie.metrics.reporter.type=GRAPHITE hoodie.metrics.graphite.host=<host> hoodie.metrics.graphite.port=<port> hoodie.metrics.graphite.metric.prefix=<metrics_prefix> {noformat} * Run the Deltastreamer * Note the following exception: ``` Exception in thread "main" org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: Task not serializable at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:165) at org.apache.hudi.common.util.Option.ifPresent(Option.java:96) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:160) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:501) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1038) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1047) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Task not serializable at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:90) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:163) ... 15 more Caused by: org.apache.hudi.exception.HoodieException: Task not serializable at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:649) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) at org.apache.spark.SparkContext.clean(SparkContext.scala:2502) at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.map(RDD.scala:421) at org.apache.spark.api.java.JavaRDDLike.map(JavaRDDLike.scala:93) at org.apache.spark.api.java.JavaRDDLike.map$(JavaRDDLike.scala:92) at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:45) ... at org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43) at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76) at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:69) at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:399) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:271) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:625) ... 4 more Caused by: java.io.NotSerializableException: org.apache.hudi.com.codahale.metrics.Timer Serialization stack: - object not serializable (class: org.apache.hudi.com.codahale.metrics.Timer, value: org.apache.hudi.com.codahale.metrics.Timer@e7bcb5c) - field (class: org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer) - object (class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics@6704c405) ... - element of array (index: 0) - array (class [Ljava.lang.Object;, size 1) - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;) - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=..., functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call: ``` The important line being: ``` - field (class: org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer) ``` __FIX:__ Make the `Timer` member variables in the `HoodieDeltaStreamerMetrics` class transient so that they are not serialized over the wire. was: Steps to reproduce: * Enable graphite metrics via props file. Example: ``` hoodie.metrics.on=true hoodie.metrics.reporter.type=GRAPHITE hoodie.metrics.graphite.host=<host> hoodie.metrics.graphite.port=<port> hoodie.metrics.graphite.metric.prefix=<metrics_prefix> ``` * Run the Deltastreamer * Note the following exception: ``` Exception in thread "main" org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: Task not serializable at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:165) at org.apache.hudi.common.util.Option.ifPresent(Option.java:96) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:160) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:501) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1038) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1047) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Task not serializable at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:90) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:163) ... 15 more Caused by: org.apache.hudi.exception.HoodieException: Task not serializable at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:649) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) at org.apache.spark.SparkContext.clean(SparkContext.scala:2502) at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.map(RDD.scala:421) at org.apache.spark.api.java.JavaRDDLike.map(JavaRDDLike.scala:93) at org.apache.spark.api.java.JavaRDDLike.map$(JavaRDDLike.scala:92) at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:45) ... at org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43) at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76) at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:69) at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:399) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:271) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:625) ... 4 more Caused by: java.io.NotSerializableException: org.apache.hudi.com.codahale.metrics.Timer Serialization stack: - object not serializable (class: org.apache.hudi.com.codahale.metrics.Timer, value: org.apache.hudi.com.codahale.metrics.Timer@e7bcb5c) - field (class: org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer) - object (class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics@6704c405) ... - element of array (index: 0) - array (class [Ljava.lang.Object;, size 1) - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;) - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=..., functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call: ``` The important line being: ``` - field (class: org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer) ``` __FIX:__ Make the `Timer` member variables in the `HoodieDeltaStreamerMetrics` class transient so that they are not serialized over the wire. > "Task not serializable" exception due to non-serializable Codahale Timers > ------------------------------------------------------------------------- > > Key: HUDI-2230 > URL: https://issues.apache.org/jira/browse/HUDI-2230 > Project: Apache Hudi > Issue Type: Bug > Components: metrics > Affects Versions: 0.9.0 > Reporter: Dave Hagman > Priority: Major > Fix For: 0.9.0 > > > Steps to reproduce: > * Enable graphite metrics via props file. Example: > {noformat} > hoodie.metrics.on=true > hoodie.metrics.reporter.type=GRAPHITE > hoodie.metrics.graphite.host=<host> > hoodie.metrics.graphite.port=<port> > hoodie.metrics.graphite.metric.prefix=<metrics_prefix> > {noformat} > * Run the Deltastreamer > * Note the following exception: > ``` > Exception in thread "main" org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: Task not serializable > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:165) > at org.apache.hudi.common.util.Option.ifPresent(Option.java:96) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:160) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:501) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1038) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1047) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: Task not serializable > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > at > org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:90) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:163) > ... 15 more > Caused by: org.apache.hudi.exception.HoodieException: Task not serializable > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:649) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2502) > at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) > at org.apache.spark.rdd.RDD.map(RDD.scala:421) > at org.apache.spark.api.java.JavaRDDLike.map(JavaRDDLike.scala:93) > at org.apache.spark.api.java.JavaRDDLike.map$(JavaRDDLike.scala:92) > at > org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:45) > ... > at > org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43) > at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76) > at > org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:69) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:399) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:271) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:625) > ... 4 more > Caused by: java.io.NotSerializableException: > org.apache.hudi.com.codahale.metrics.Timer > Serialization stack: > - object not serializable (class: > org.apache.hudi.com.codahale.metrics.Timer, value: > org.apache.hudi.com.codahale.metrics.Timer@e7bcb5c) > - field (class: > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: > overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer) > - object (class > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics@6704c405) > ... > - element of array (index: 0) > - array (class [Ljava.lang.Object;, size 1) > - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, > type: class [Ljava.lang.Object;) > - object (class java.lang.invoke.SerializedLambda, > SerializedLambda[capturingClass=..., > functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call: > ``` > The important line being: > ``` > - field (class: > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: > overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer) > ``` > __FIX:__ > Make the `Timer` member variables in the `HoodieDeltaStreamerMetrics` class > transient so that they are not serialized over the wire. > -- This message was sent by Atlassian Jira (v8.3.4#803005)