[jira] [Comment Edited] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773594#comment-15773594
 ] 

Chen He edited comment on SPARK-18988 at 12/23/16 8:06 PM:
---

The "automatically" means HistoryServer will create it during startup instead 
of customer submitted Spark on YARN jobs, Is it correct? Really appreciate if 
you can give us at least 24 hours to answer your proposed question instead of 
close it directly.


was (Author: airbots):
The "automatically" means HistoryServer will create it during startup. Is it 
correct? Really appreciate if you can give us at least 24 hours to answer your 
proposed question instead of close it directly.

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1, 2.1.0
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>   at org.apache.spark.SparkContext.(SparkContext.scala:549)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
>   at com.oracle.test.logs.Main.main(Main.java:13)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>   Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773594#comment-15773594
 ] 

Chen He commented on SPARK-18988:
-

The "automatically" means HistoryServer will create it during startup. Is it 
correct? Really appreciate if you can give us at least 24 hours to answer your 
proposed question instead of close it directly.

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1, 2.1.0
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>   at org.apache.spark.SparkContext.(SparkContext.scala:549)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
>   at com.oracle.test.logs.Main.main(Main.java:13)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>   Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773186#comment-15773186
 ] 

Chen He edited comment on SPARK-18988 at 12/23/16 4:18 PM:
---

OK, if this is reasonable, why "spark.history.fs.logDirectory" can be 
automatically created but "spark.eventLog.dir" can not? 
What is the difference between eventLog and the history? 

As a system, if user has already clearly provided a directory either in 
original config or a job config. It means they know where to create the 
eventLog. And they can also get this config info from job .xml file after or 
during job running. How it becomes "accidentally silently"? Sorry, it does not 
make sense to me. Reopen it.


was (Author: airbots):
OK, if this is reasonable, why "spark.history.fs.logDirectory" can be 
automatically created but "spark.eventLog.dir" can not? 
What is the difference between eventLog and the history? 

As a system, if user has already clearly provided a directory either in 
original config or a job config. It means they know where to created the 
eventLog. And they can also get this config info from job .xml file after or 
during job running. How it becomes "accidentally silently"? Sorry, it does not 
make sense to me. Reopen it.

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1, 2.1.0
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>   at org.apache.spark.SparkContext.(SparkContext.scala:549)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
>   at com.oracle.test.logs.Main.main(Main.java:13)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>   Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773186#comment-15773186
 ] 

Chen He edited comment on SPARK-18988 at 12/23/16 4:16 PM:
---

OK, if this is reasonable, why "spark.history.fs.logDirectory" can be 
automatically created but "spark.eventLog.dir" can not? 
What is the difference between eventLog and the history? 

As a system, if user has already clearly provided a directory either in 
original config or a job config. It means they know where to created the 
eventLog. And they can also get this config info from job .xml file after or 
during job running. How it becomes "accidentally silently"? Sorry, it does not 
make sense to me. Reopen it.


was (Author: airbots):
OK, if this is reasonable, why "spark.history.fs.logDirectory" can be 
automatically created but "spark.eventLog.dir" can not? 
What is the difference between eventLog and the history? 

As a system, if user has already clearly provided a directory either in 
original config or a job directory. It means they know where to created the 
eventLog. How it becomes "accidentally silently"? Sorry, it does not make sense 
to me. Reopen it.

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1, 2.1.0
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>   at org.apache.spark.SparkContext.(SparkContext.scala:549)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
>   at com.oracle.test.logs.Main.main(Main.java:13)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>   Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773186#comment-15773186
 ] 

Chen He edited comment on SPARK-18988 at 12/23/16 4:10 PM:
---

OK, if this is reasonable, why "spark.history.fs.logDirectory" can be 
automatically created but "spark.eventLog.dir" can not? 
What is the difference between eventLog and the history? 

As a system, if user has already clearly provided a directory either in 
original config or a job directory. It means they know where to created the 
eventLog. How it becomes "accidentally silently"? Sorry, it does not make sense 
to me. Reopen it.


was (Author: airbots):
OK, if this is reasonable, why "spark.history.fs.logDirectory" can be 
automatically created but "spark.eventLog.dir" can not? 
What is the difference between eventLog and the history? 

As a system, if user has already clearly provide a directory either in original 
config or a job directory. It means they know where to created the eventLog. 
How it becomes "accidentally silently"? Sorry, it does not make sense to me. 
Reopen it.

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1, 2.1.0
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>   at org.apache.spark.SparkContext.(SparkContext.scala:549)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
>   at com.oracle.test.logs.Main.main(Main.java:13)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>   Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He reopened SPARK-18988:
-

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1, 2.1.0
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>   at org.apache.spark.SparkContext.(SparkContext.scala:549)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
>   at com.oracle.test.logs.Main.main(Main.java:13)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>   Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773186#comment-15773186
 ] 

Chen He commented on SPARK-18988:
-

OK, if this is reasonable, why "spark.history.fs.logDirectory" can be 
automatically created but "spark.eventLog.dir" can not? 
What is the difference between eventLog and the history? 

As a system, if user has already clearly provide a directory either in original 
config or a job directory. It means they know where to created the eventLog. 
How it becomes "accidentally silently"? Sorry, it does not make sense to me. 
Reopen it.

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1, 2.1.0
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>   at org.apache.spark.SparkContext.(SparkContext.scala:549)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
>   at com.oracle.test.logs.Main.main(Main.java:13)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>   Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated SPARK-18988:

Affects Version/s: 2.1.0

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1, 2.1.0
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>   at org.apache.spark.SparkContext.(SparkContext.scala:549)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
>   at com.oracle.test.logs.Main.main(Main.java:13)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>   Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772317#comment-15772317
 ] 

Chen He commented on SPARK-18988:
-

It should be like: 
{code}
if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDirectory) {
  try {
fileSystem.mkdirs(new Path(logBaseDir))
  } catch {
case e: Exception =>
  throw new IllegalArgumentException(s"Can not create log directory 
$logBaseDir .")
  }
}

{code}

> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
> "spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports 
> following error. 
> ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File does not exist: 
> hdfs:/spark-history/eventLog
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
>   at org.apache.spark.SparkContext.(SparkContext.scala:549)
>   at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
>   at com.oracle.test.logs.Main.main(Main.java:13)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> If spark event history has to be the same as "spark.history.fs.logDirectory", 
> why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
> should try to create this dir instead of just simply throwing exception. 
> {code}
>   def start() {
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> It cause confusion, at the same time, Spark documentation does not make it 
> clear
> {quote}
>   Base directory in which Spark events are logged, if 
> spark.eventLog.enabled is true. *Within this base directory* (???you must 
> make sure it already exists???), Spark creates a sub-directory for each 
> application, and logs the events specific to the application in this 
> directory. Users may want to set this to a unified location like an HDFS 
> directory so history files can be read by the history server.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated SPARK-18988:

Description: 
When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
"spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports following 
error. 

ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File does not exist: hdfs:/spark-history/eventLog
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
at 
org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
at org.apache.spark.SparkContext.(SparkContext.scala:549)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at com.oracle.test.logs.Main.main(Main.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)

If spark event history has to be the same as "spark.history.fs.logDirectory", 
why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
should try to create this dir instead of just simply throwing exception. 

{code}
  def start() {
if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
  throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
exist.")
}
{code}

It cause confusion, at the same time, Spark documentation does not make it clear
{quote}
Base directory in which Spark events are logged, if 
spark.eventLog.enabled is true. *Within this base directory* (???you must make 
sure it already exists???), Spark creates a sub-directory for each application, 
and logs the events specific to the application in this directory. Users may 
want to set this to a unified location like an HDFS directory so history files 
can be read by the history server.
{quote}


  was:
When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
"spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports following 
error. 

ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File does not exist: hdfs:/spark-history/eventLog
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
at 
org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
at org.apache.spark.SparkContext.(SparkContext.scala:549)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at com.oracle.test.logs.Main.main(Main.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)

If spark event history has to be the same as "spark.history.fs.logDirectory", 
why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
should try to create this dir instead of throwing exception. 

{code}
  def start() {
if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
  throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
exist.")
}
{code}

It cause confusion, at the same time, Spark documentation does not make it clear
{quote}
Base directory in which Spark events are logged, if 
spark.eventLog.enabled is true. *Within this base directory* (???you must make 
sure it already exists???), Spark creates a sub-directory for each application, 
and logs the events specific to the application in this directory. Users may 
want to set this to a unified location like an HDFS directory so history files 
can be read by the history server.
{quote}



> Spark "spark.eventLog.dir" dir should create the directory 

[jira] [Updated] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated SPARK-18988:

Description: 
When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
"spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports following 
error. 

ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File does not exist: hdfs:/spark-history/eventLog
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
at 
org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
at org.apache.spark.SparkContext.(SparkContext.scala:549)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at com.oracle.test.logs.Main.main(Main.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)

If spark event history has to be the same as "spark.history.fs.logDirectory", 
why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
should try to create this dir instead of throwing exception. 

{code}
  def start() {
if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
  throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
exist.")
}
{code}

It cause confusion, at the same time, Spark documentation does not make it clear
{quote}
Base directory in which Spark events are logged, if 
spark.eventLog.enabled is true. *Within this base directory* (???you must make 
sure it already exists???), Spark creates a sub-directory for each application, 
and logs the events specific to the application in this directory. Users may 
want to set this to a unified location like an HDFS directory so history files 
can be read by the history server.
{quote}


  was:
When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
"spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports following 
error. 

ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File does not exist: hdfs:/spark-history/eventLog
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
at 
org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
at org.apache.spark.SparkContext.(SparkContext.scala:549)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at com.oracle.test.logs.Main.main(Main.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)

If spark event history has to be the same as "spark.history.fs.logDirectory", 
why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
should try to create this dir instead of throwing exception. 

{code}
  def start() {
if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
  throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
exist.")
}
{code}



> Spark "spark.eventLog.dir" dir should create the directory if it is different 
> from "spark.history.fs.logDirectory"
> --
>
> Key: SPARK-18988
> URL: https://issues.apache.org/jira/browse/SPARK-18988
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1
>Reporter: Chen He
>Priority: Minor
>
> When set "spark.history.fs.logDirectory" to be 

[jira] [Created] (SPARK-18988) Spark "spark.eventLog.dir" dir should create the directory if it is different from "spark.history.fs.logDirectory"

2016-12-23 Thread Chen He (JIRA)
Chen He created SPARK-18988:
---

 Summary: Spark "spark.eventLog.dir" dir should create the 
directory if it is different from "spark.history.fs.logDirectory"
 Key: SPARK-18988
 URL: https://issues.apache.org/jira/browse/SPARK-18988
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.6.1
Reporter: Chen He
Priority: Minor


When set "spark.history.fs.logDirectory" to be hdfs:///spark-history but set 
"spark.eventLog.dir" to be hdfs:///spark-history/eventLog. It reports following 
error. 

ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File does not exist: hdfs:/spark-history/eventLog
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
at 
org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
at org.apache.spark.SparkContext.(SparkContext.scala:549)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
at com.oracle.test.logs.Main.main(Main.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)

If spark event history has to be the same as "spark.history.fs.logDirectory", 
why has "spark.eventLog.dir". If not, In the EventLoggingListener.start(). It 
should try to create this dir instead of throwing exception. 

{code}
  def start() {
if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDir) {
  throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
exist.")
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18968) .sparkStaging quickly fill up HDFS

2016-12-21 Thread Chen He (JIRA)
Chen He created SPARK-18968:
---

 Summary: .sparkStaging quickly fill up HDFS
 Key: SPARK-18968
 URL: https://issues.apache.org/jira/browse/SPARK-18968
 Project: Spark
  Issue Type: Bug
  Components: DStreams
Affects Versions: 1.6.2
Reporter: Chen He


We are running streaming jobs using spark. Even the 
"spark.yarn.preserve.staging.files" is set to b "false", HDFS is quickly been 
filled up. 

Also find people ask mail-list similar question but no further response. 
http://apache-spark-user-list.1001560.n3.nabble.com/HDFS-folder-sparkStaging-not-deleted-and-filled-up-HDFS-in-yarn-mode-td7851.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13628) Temporary intermediate output file should be renamed before copying to destination filesystem

2016-03-09 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187419#comment-15187419
 ] 

Chen He commented on SPARK-13628:
-

Really appreciate your reply, Mr. Sean Owen. I am new to Spark and using it to 
talk to blobstore. We met a performance bottleneck that just as described in 
this issue. Would you mind provide more detail about the process or where can I 
find those details such as which class is doing the exactly rename and copy 
process. Thanks a lot!

> Temporary intermediate output file should be renamed before copying to 
> destination filesystem
> -
>
> Key: SPARK-13628
> URL: https://issues.apache.org/jira/browse/SPARK-13628
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 1.6.0
>Reporter: Chen He
>
> Spark Executor will dump temporary file into local temp dir, copy it to 
> destination filesystem, and then, rename it. It could be costly for Blobstore 
> (such as openstack swift) which do the actual copy when file is renamed. If 
> it does not affect other components, we may switch the sequence of copy and 
> rename so that Spark can use Blobstore  as final output destination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13628) Temporary intermediate output file should be renamed before copying to destination filesystem

2016-03-02 Thread Chen He (JIRA)
Chen He created SPARK-13628:
---

 Summary: Temporary intermediate output file should be renamed 
before copying to destination filesystem
 Key: SPARK-13628
 URL: https://issues.apache.org/jira/browse/SPARK-13628
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output
Affects Versions: 1.6.0
Reporter: Chen He


Spark Executor will dump temporary file into local temp dir, copy it to 
destination filesystem, and then, rename it. It could be costly for Blobstore 
(such as openstack swift) which do the actual copy when file is renamed. If it 
does not affect other components, we may switch the sequence of copy and rename 
so that Spark can use Blobstore  as final output destination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2277) Make TaskScheduler track whether there's host on a rack

2014-07-02 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050381#comment-14050381
 ] 

Chen He commented on SPARK-2277:


This is interesting. I will take a look.

 Make TaskScheduler track whether there's host on a rack
 ---

 Key: SPARK-2277
 URL: https://issues.apache.org/jira/browse/SPARK-2277
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Rui Li

 When TaskSetManager adds a pending task, it checks whether the tasks's 
 preferred location is available. Regarding RACK_LOCAL task, we consider the 
 preferred rack available if such a rack is defined for the preferred host. 
 This is incorrect as there may be no alive hosts on that rack at all. 
 Therefore, TaskScheduler should track the hosts on each rack, and provides an 
 API for TaskSetManager to check if there's host alive on a specific rack.



--
This message was sent by Atlassian JIRA
(v6.2#6252)