[
https://issues.apache.org/jira/browse/CASSANALYTICS-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010648#comment-18010648
]
Francisco Guerrero commented on CASSANALYTICS-80:
-------------------------------------------------
This is something I've encountered as well. Kryo is never used for
serialization for the bulk reader path. I think it's a valid option to consider
removing the code if it never is exercised.
> Kryo serialization for CassandraDataLayer is never used by Spark due to
> broadcast
> ---------------------------------------------------------------------------------
>
> Key: CASSANALYTICS-80
> URL: https://issues.apache.org/jira/browse/CASSANALYTICS-80
> Project: Apache Cassandra Analytics
> Issue Type: Improvement
> Reporter: Liu Cao
> Priority: Normal
>
> To reproduce, simply run any integration test for the bulk reader:
> # Change the logging setting or the logging level of [this
> line|https://github.com/apache/cassandra-analytics/blob/trunk/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L759]
> to INFO
> # Run the following
> ./gradlew cassandra-analytics-integration-tests:test --tests
> "org.apache.cassandra.analytics.BulkReaderTest.testUsingSingleSidecarContactPoint"
> --debug > test.out
>
> We can see in the test output that
> ```
> Falling back to JDK serialization.
> ```
> And there is no sign of of the log line
> [here|https://github.com/apache/cassandra-analytics/blob/a6bbbfa8689bd84705943b96444e8d8151376e27/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L828C26-L828C66]
> (Serializing CassandraDataLayer with Kryo) that should have appeared if Kyro
> were used.
>
> After setting up the debugger, I have confirmed that the spark session was
> properly configrued with Kryo - I evaluated the spark session config at
> runtime to confirm the proper register and serializer was set up. This also
> showed up in the logs:
> ```
> INFO KryoRegister: Setting kryo registrators:
> org.apache.cassandra.spark.bulkwriter.util.SbwKryoRegistrator,org.apache.cassandra.spark.KryoRegister
> ```
> After stepping through the code, the issue seems to be that the
> TorrentBroadcast in spark decided to use the JDK serialization anyway.
>
> Has anyone else tested out the kryo serilization setup? If Kryo is never used
> maybe we shouldn't bother maintaining the complexity
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]