[ 
https://issues.apache.org/jira/browse/SPARK-35428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426440#comment-17426440
 ] 

Jungtaek Lim commented on SPARK-35428:
--------------------------------------

I think this is based on the characteristic on S3. S3 doesn't support "append", 
and the file is uploaded to S3 "once" Spark "closes" the file. That is not 
something we can fix.

Instead, from Spark 3.0, we introduced rolling event log, which splits down the 
event log into multiple parts. Each part can be uploaded once the log is 
rolling. Still couldn't be a real-time, though.

https://spark.apache.org/docs/latest/monitoring.html#applying-compaction-on-rolling-event-log-files

> Spark history Server to S3 doesn't show incomplete applications
> ---------------------------------------------------------------
>
>                 Key: SPARK-35428
>                 URL: https://issues.apache.org/jira/browse/SPARK-35428
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.5
>         Environment: Jupyter Notebook sparkmagic with Spark(2.4.5)  client 
> mode running on Kubernetes
>            Reporter: Tianbin Jiang
>            Priority: Major
>
> Jupyter Notebook sparkmagic with Spark(2.4.5)  client mode running on 
> Kubernetes.  I am redirecting the spark event logs to a S3 with the following 
> configuration:
>   
>  spark.eventLog.enabled = true
>  spark.history.ui.port = 18080
>  spark.eventLog.dir = s3://livy-spark-log/spark-history/
>  spark.history.fs.logDirectory = s3://livy-spark-log/spark-history/
>  spark.history.fs.update.interval = 5s
> spark.eventLog.buffer.kb = 1k
>  
> spark.streaming.driver.writeAheadLog.closeFileAfterWrite = true
>  spark.streaming.receiver.writeAheadLog.closeFileAfterWrite = true
>  
>   
>  Once my application is completed, I can see it shows up on the spark history 
> server. However, running applications doesn't show up on "incomplete 
> applications". I have also checked the log, whenever my application end, I 
> can see this message:
>   
>  {{21/05/17 06:14:18 INFO k8s.KubernetesClusterSchedulerBackend: Shutting 
> down all executors}}
>  {{21/05/17 06:14:18 INFO 
> k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down}}
>  {{21/05/17 06:14:18 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes 
> client has been closed (this is expected if the application is shutting 
> down.)}}
>  *{{21/05/17 06:14:18 INFO s3n.MultipartUploadOutputStream: close 
> closed:false 
> s3://livy-spark-log/spark-history/spark-48c3141875fe4c67b5708400134ea3d6.inprogress}}*
>  *{{21/05/17 06:14:19 INFO s3n.S3NativeFileSystem: rename 
> s3://livy-spark-log/spark-history/spark-48c3141875fe4c67b5708400134ea3d6.inprogress
>  s3://livy-spark-log/spark-history/spark-48c3141875fe4c67b5708400134ea3d6}}*
>  {{21/05/17 06:14:19 INFO spark.MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!}}
>  {{21/05/17 06:14:19 INFO memory.MemoryStore: MemoryStore cleared}}
>  {{21/05/17 06:14:19 INFO storage.BlockManager: BlockManager stopped}}
>   
>   
>  {{I am not able to see any xx.inprogress file on S3 though. Anyone had this 
> problem before? Otherwise, I would take it as a bug.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to