[ https://issues.apache.org/jira/browse/SPARK-40912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen resolved SPARK-40912. ---------------------------------- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 38428 [https://github.com/apache/spark/pull/38428] > Overhead of Exceptions in DeserializationStream > ------------------------------------------------ > > Key: SPARK-40912 > URL: https://issues.apache.org/jira/browse/SPARK-40912 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.3.0 > Reporter: Emil Ejbyfeldt > Assignee: Emil Ejbyfeldt > Priority: Minor > Fix For: 3.5.0 > > > The interface of DeserializationStream forces implementation to raise > EOFException to indicate that there is no more data. And for the > KryoDeserializtionStream it even worse since the kryo library does not raise > EOFException we pay for the price of two exceptions for each stream. For > large shuffles with lots of small stream this is quite a bit large overhead > (seen couple % of cpu time). It also less safe to depend exceptions as it > might me raised for different reasons like corrupt data and that currently > cause data loss. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org