[ https://issues.apache.org/jira/browse/SPARK-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848403#comment-15848403 ]
Steve Loughran commented on SPARK-19111: ---------------------------------------- Charles, you might also want to keep an eye on: HADOOP-14028 . Until that fix is in, s3a will leak disk space for the life of the process, here a 50GB file. > S3 Mesos history upload fails silently if too large > --------------------------------------------------- > > Key: SPARK-19111 > URL: https://issues.apache.org/jira/browse/SPARK-19111 > Project: Spark > Issue Type: Bug > Components: EC2, Mesos, Spark Core > Affects Versions: 2.0.0 > Reporter: Charles Allen > > {code} > 2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped > Spark web UI at http://REDACTED:4041 > 2017-01-06T21:32:32,938 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.jvmGCTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBlocksFetched > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSerializationTime > 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate( > 364,WrappedArray()) > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.resultSize > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.peakExecutionMemory > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.fetchWaitTime > 2017-01-06T21:32:32,939 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.memoryBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.diskBytesSpilled > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.localBytesRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.recordsRead > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorDeserializeTime > 2017-01-06T21:32:32,940 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.executorRunTime > 2017-01-06T21:32:32,941 INFO [SparkListenerBus] > com.metamx.starfire.spark.SparkDriver - emitting metric: > internal.metrics.shuffle.read.remoteBlocksFetched > 2017-01-06T21:32:32,943 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' > closed. Now beginning upload > 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) > 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) > 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] > org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already > stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray()) > {code} > Running spark on mesos, some large jobs fail to upload to the history server > storage! > A successful sequence of events in the log that yield an upload are as > follows: > {code} > 2017-01-06T19:14:32,925 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' > 2017-01-06T21:59:14,789 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > closed. Now beginning upload > 2017-01-06T21:59:44,679 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key > 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' > upload complete > {code} > But large jobs do not ever get to the {{upload complete}} log message, and > instead exit before completion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org