[ https://issues.apache.org/jira/browse/SPARK-38329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neven Jovic updated SPARK-38329: -------------------------------- Attachment: Screenshot from 2022-02-25 14-16-11.png > High I/O wait when Spark Structured Streaming checkpoint changed to EFS > ----------------------------------------------------------------------- > > Key: SPARK-38329 > URL: https://issues.apache.org/jira/browse/SPARK-38329 > Project: Spark > Issue Type: Question > Components: EC2, Input/Output, PySpark, Structured Streaming > Affects Versions: 2.4.6 > Reporter: Neven Jovic > Priority: Major > Attachments: Screenshot from 2022-02-25 14-16-11.png, q.png > > > I'm currently running spark structured streaming application written in > python(pyspark) where my source is kafka topic and sink i mongodb. I changed > my checkpoint to Amazon EFS, which is distributed on all spark workers and > after that I got increased I/o wait, averaging 8% > > !Screenshot from 2022-02-25 14-16-11.png! > Currently I have 6000 messages coming to kafka every second, and I get every > once in a while a WARN message: > {quote}22/02/25 13:12:31 WARN HDFSBackedStateStoreProvider: Error cleaning up > files for HDFSStateStoreProvider[id = (op=0,part=90),dir = > file:/mnt/efs_max_io/spark/state/0/90] java.lang.NumberFormatException: For > input string: "" > {quote} > I'm not quite sure if that message has anything to do with high I/O wait and > is this behavior expected, or something to be concerned about? > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org