[ https://issues.apache.org/jira/browse/SPARK-40912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen reassigned SPARK-40912: ------------------------------------ Assignee: Emil Ejbyfeldt > Overhead of Exceptions in DeserializationStream > ------------------------------------------------ > > Key: SPARK-40912 > URL: https://issues.apache.org/jira/browse/SPARK-40912 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.3.0 > Reporter: Emil Ejbyfeldt > Assignee: Emil Ejbyfeldt > Priority: Minor > > The interface of DeserializationStream forces implementation to raise > EOFException to indicate that there is no more data. And for the > KryoDeserializtionStream it even worse since the kryo library does not raise > EOFException we pay for the price of two exceptions for each stream. For > large shuffles with lots of small stream this is quite a bit large overhead > (seen couple % of cpu time). It also less safe to depend exceptions as it > might me raised for different reasons like corrupt data and that currently > cause data loss. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org