[jira] [Comment Edited] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

Chen He (JIRA) Fri, 23 Dec 2016 08:17:11 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15773186#comment-15773186
 ]


Chen He edited comment on SPARK-18988 at 12/23/16 4:16 PM:
-----------------------------------------------------------

OK, if this is reasonable, why "spark.history.fs.logDirectory" can be 
automatically created but "spark.eventLog.dir" can not? 
What is the difference between eventLog and the history? 

As a system, if user has already clearly provided a directory either in 
original config or a job config. It means they know where to created the 
eventLog. And they can also get this config info from job .xml file after or 
during job running. How it becomes "accidentally silently"? Sorry, it does not 
make sense to me. Reopen it.


was (Author: airbots):
OK, if this is reasonable, why "spark.history.fs.logDirectory" can be 
automatically created but "spark.eventLog.dir" can not? 
What is the difference between eventLog and the history? 

As a system, if user has already clearly provided a directory either in 
original config or a job directory. It means they know where to created the 
eventLog. How it becomes "accidentally silently"? Sorry, it does not make sense 
to me. Reopen it.

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18988
>                 URL: https://issues.apache.org/jira/browse/SPARK-18988
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.6.1, 2.1.0
>            Reporter: Chen He
>            Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>       at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>       at org.apache.spark.SparkContext.<init>(SparkContext.scala:549)
>       at 
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
>       at com.oracle.test.logs.Main.main(Main.java:13)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
>     if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>       throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
>     }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>       Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

Reply via email to