[ 
https://issues.apache.org/jira/browse/HUDI-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Hagman updated HUDI-2230:
------------------------------
    Description: 
Steps to reproduce:
 * Enable graphite metrics via props file. Example:


{noformat}
hoodie.metrics.on=true
hoodie.metrics.reporter.type=GRAPHITE
hoodie.metrics.graphite.host=<host>
hoodie.metrics.graphite.port=<port>
hoodie.metrics.graphite.metric.prefix=<metrics_prefix>
{noformat}


 * Run the Deltastreamer
 * Note the following exception:

```
Exception in thread "main" org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: Task not serializable
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:165)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:160)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:501)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1038)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1047)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: Task not serializable
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at 
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:90)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:163)
        ... 15 more
Caused by: org.apache.hudi.exception.HoodieException: Task not serializable
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:649)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Task not serializable
        at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
        at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
        at org.apache.spark.rdd.RDD.map(RDD.scala:421)
        at org.apache.spark.api.java.JavaRDDLike.map(JavaRDDLike.scala:93)
        at org.apache.spark.api.java.JavaRDDLike.map$(JavaRDDLike.scala:92)
        at 
org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:45)
        ...
        at 
org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
        at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
        at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:69)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:399)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:271)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:625)
        ... 4 more
Caused by: java.io.NotSerializableException: 
org.apache.hudi.com.codahale.metrics.Timer
Serialization stack:
        - object not serializable (class: 
org.apache.hudi.com.codahale.metrics.Timer, value: 
org.apache.hudi.com.codahale.metrics.Timer@e7bcb5c)
        - field (class: 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: 
overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer)
        - object (class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics@6704c405)
        ...
        - element of array (index: 0)
        - array (class [Ljava.lang.Object;, size 1)
        - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
type: class [Ljava.lang.Object;)
        - object (class java.lang.invoke.SerializedLambda, 
SerializedLambda[capturingClass=..., 
functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call:
```

The important line being:

```
- field (class: 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: 
overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer)
```

__FIX:__

Make the `Timer` member variables in the `HoodieDeltaStreamerMetrics` class 
transient so that they are not serialized over the wire. 
 

  was:
Steps to reproduce:
 * Enable graphite metrics via props file. Example:

```
hoodie.metrics.on=true
hoodie.metrics.reporter.type=GRAPHITE
hoodie.metrics.graphite.host=<host>
hoodie.metrics.graphite.port=<port>
hoodie.metrics.graphite.metric.prefix=<metrics_prefix>
```

 * Run the Deltastreamer
 * Note the following exception:

```
Exception in thread "main" org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: Task not serializable
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:165)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:160)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:501)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1038)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1047)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: Task not serializable
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at 
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:90)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:163)
        ... 15 more
Caused by: org.apache.hudi.exception.HoodieException: Task not serializable
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:649)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Task not serializable
        at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
        at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
        at org.apache.spark.rdd.RDD.map(RDD.scala:421)
        at org.apache.spark.api.java.JavaRDDLike.map(JavaRDDLike.scala:93)
        at org.apache.spark.api.java.JavaRDDLike.map$(JavaRDDLike.scala:92)
        at 
org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:45)
        ...
        at 
org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
        at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
        at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:69)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:399)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:271)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:625)
        ... 4 more
Caused by: java.io.NotSerializableException: 
org.apache.hudi.com.codahale.metrics.Timer
Serialization stack:
        - object not serializable (class: 
org.apache.hudi.com.codahale.metrics.Timer, value: 
org.apache.hudi.com.codahale.metrics.Timer@e7bcb5c)
        - field (class: 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: 
overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer)
        - object (class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics@6704c405)
        ...
        - element of array (index: 0)
        - array (class [Ljava.lang.Object;, size 1)
        - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
type: class [Ljava.lang.Object;)
        - object (class java.lang.invoke.SerializedLambda, 
SerializedLambda[capturingClass=..., 
functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call:
```

The important line being:

```
- field (class: 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: 
overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer)
```

__FIX:__

Make the `Timer` member variables in the `HoodieDeltaStreamerMetrics` class 
transient so that they are not serialized over the wire. 
 


> "Task not serializable" exception due to non-serializable Codahale Timers
> -------------------------------------------------------------------------
>
>                 Key: HUDI-2230
>                 URL: https://issues.apache.org/jira/browse/HUDI-2230
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.9.0
>            Reporter: Dave Hagman
>            Priority: Major
>             Fix For: 0.9.0
>
>
> Steps to reproduce:
>  * Enable graphite metrics via props file. Example:
> {noformat}
> hoodie.metrics.on=true
> hoodie.metrics.reporter.type=GRAPHITE
> hoodie.metrics.graphite.host=<host>
> hoodie.metrics.graphite.port=<port>
> hoodie.metrics.graphite.metric.prefix=<metrics_prefix>
> {noformat}
>  * Run the Deltastreamer
>  * Note the following exception:
> ```
> Exception in thread "main" org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: Task not serializable
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:165)
>       at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:160)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:501)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>       at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
>       at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>       at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1038)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1047)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: Task not serializable
>       at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>       at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>       at 
> org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:90)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:163)
>       ... 15 more
> Caused by: org.apache.hudi.exception.HoodieException: Task not serializable
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:649)
>       at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.spark.SparkException: Task not serializable
>       at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
>       at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>       at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
>       at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
>       at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>       at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>       at org.apache.spark.rdd.RDD.map(RDD.scala:421)
>       at org.apache.spark.api.java.JavaRDDLike.map(JavaRDDLike.scala:93)
>       at org.apache.spark.api.java.JavaRDDLike.map$(JavaRDDLike.scala:92)
>       at 
> org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:45)
>       ...
>       at 
> org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
>       at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
>       at 
> org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:69)
>       at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:399)
>       at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:271)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:625)
>       ... 4 more
> Caused by: java.io.NotSerializableException: 
> org.apache.hudi.com.codahale.metrics.Timer
> Serialization stack:
>       - object not serializable (class: 
> org.apache.hudi.com.codahale.metrics.Timer, value: 
> org.apache.hudi.com.codahale.metrics.Timer@e7bcb5c)
>       - field (class: 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: 
> overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer)
>       - object (class 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics@6704c405)
>       ...
>       - element of array (index: 0)
>       - array (class [Ljava.lang.Object;, size 1)
>       - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
> type: class [Ljava.lang.Object;)
>       - object (class java.lang.invoke.SerializedLambda, 
> SerializedLambda[capturingClass=..., 
> functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call:
> ```
> The important line being:
> ```
> - field (class: 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics, name: 
> overallTimer, type: class org.apache.hudi.com.codahale.metrics.Timer)
> ```
> __FIX:__
> Make the `Timer` member variables in the `HoodieDeltaStreamerMetrics` class 
> transient so that they are not serialized over the wire. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to