[ https://issues.apache.org/jira/browse/SPARK-38557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-38557. ---------------------------------- Resolution: Invalid > What may be a cause for HDFSMetadataCommitter: Error while fetching MetaData > and how to fix or work around this? > ---------------------------------------------------------------------------------------------------------------- > > Key: SPARK-38557 > URL: https://issues.apache.org/jira/browse/SPARK-38557 > Project: Spark > Issue Type: Question > Components: Structured Streaming > Affects Versions: 3.1.1 > Environment: Spark 3.1.1 > AWS EMR 6.3.0 > python 3.7.2 > Reporter: Dmitry Goldenberg > Priority: Major > > I'm seeing errors such as the below when executing structured Spark Streaming > app which streams data from AWS Kinesis. > > I've googled the error but can't tell what may be the cause. Is Spark running > out of disk space? something else? > {code:java} > // From the stderr log in EMR > 22/03/15 00:54:00 WARN HDFSMetadataCommitter: Error while fetching MetaData > [attempt = 1] > java.lang.IllegalStateException: > hdfs://ip-10-2-XXX-XXX.awsinternal.acme.com:8020/mnt/tmp/temporary-03b8fecf-32d5-422c-9375-4c3450ed0bb8/sources/0/shard-commit/0 > does not exist > at > org.apache.spark.sql.kinesis.HDFSMetadataCommitter.$anonfun$get$1(HDFSMetadataCommitter.scala:163) > at > org.apache.spark.sql.kinesis.HDFSMetadataCommitter.withRetry(HDFSMetadataCommitter.scala:229) > at > org.apache.spark.sql.kinesis.HDFSMetadataCommitter.get(HDFSMetadataCommitter.scala:151) > at > org.apache.spark.sql.kinesis.KinesisSource.prevBatchShardInfo(KinesisSource.scala:275) > at > org.apache.spark.sql.kinesis.KinesisSource.getOffset(KinesisSource.scala:163) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$6(MicroBatchExecution.scala:399) > at > org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:357) > at > org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:355) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$2(MicroBatchExecution.scala:399) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:128) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$1(MicroBatchExecution.scala:382) > at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:613) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.constructNextBatch(MicroBatchExecution.scala:378) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:211) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:357) > at > org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:355) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:194) > at > org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:57) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:188) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:333) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:244){code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org