[ https://issues.apache.org/jira/browse/SPARK-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379376#comment-15379376 ]
Thomas Graves commented on SPARK-5311: -------------------------------------- Note as a follow up, obviously you can configure it to be any directory you want, but the idea is the event log dir is a shared place where people can put there log files that will be read by the history server. The history server needs to know the place to get them and have permissions to see them. > EventLoggingListener throws exception if log directory does not exist > --------------------------------------------------------------------- > > Key: SPARK-5311 > URL: https://issues.apache.org/jira/browse/SPARK-5311 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.3.0 > Reporter: Josh Rosen > Assignee: Josh Rosen > Priority: Blocker > > If the log directory does not exist, EventLoggingListener throws an > IllegalArgumentException. Here's a simple reproduction (using the master > branch (1.3.0)): > {code} > ./bin/spark-shell --conf spark.eventLog.enabled=true --conf > spark.eventLog.dir=/tmp/nonexistent-dir > {code} > where /tmp/nonexistent-dir is a directory that doesn't exist and /tmp exists. > This results in the following exception: > {code} > 15/01/18 17:10:44 INFO HttpServer: Starting HTTP Server > 15/01/18 17:10:44 INFO Utils: Successfully started service 'HTTP file server' > on port 62729. > 15/01/18 17:10:44 WARN Utils: Service 'SparkUI' could not bind on port 4040. > Attempting port 4041. > 15/01/18 17:10:44 INFO Utils: Successfully started service 'SparkUI' on port > 4041. > 15/01/18 17:10:44 INFO SparkUI: Started SparkUI at > http://joshs-mbp.att.net:4041 > 15/01/18 17:10:45 INFO Executor: Using REPL class URI: > http://192.168.1.248:62726 > 15/01/18 17:10:45 INFO AkkaUtils: Connecting to HeartbeatReceiver: > akka.tcp://sparkdri...@joshs-mbp.att.net:62728/user/HeartbeatReceiver > 15/01/18 17:10:45 INFO NettyBlockTransferService: Server created on 62730 > 15/01/18 17:10:45 INFO BlockManagerMaster: Trying to register BlockManager > 15/01/18 17:10:45 INFO BlockManagerMasterActor: Registering block manager > localhost:62730 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 62730) > 15/01/18 17:10:45 INFO BlockManagerMaster: Registered BlockManager > java.lang.IllegalArgumentException: Log directory /tmp/nonexistent-dir does > not exist. > at > org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:90) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:363) > at > org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:986) > at $iwC$$iwC.<init>(<console>:9) > at $iwC.<init>(<console>:18) > at <init>(<console>:20) > at .<init>(<console>:24) > at .<clinit>(<console>) > at .<init>(<console>:7) > at .<clinit>(<console>) > at $print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) > at > org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123) > at > org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122) > at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:270) > at > org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122) > at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:60) > at > org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:147) > at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:60) > at > org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106) > at > org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:60) > at > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:962) > at > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) > at > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > It looks like the directory existence check was introduced in > https://github.com/apache/spark/commit/456451911d11cc0b6738f31b1e17869b1fb51c87?diff=unified. > This is a change of behavior / regression from earlier Spark versions, > which would create the event log directory if it did not exist. > I think the intent of this check may have been to handle cases where the > event directory path corresponds to an existing file, so maybe we can guard > the `!isDirectory` check with an `exists` check first and change the error > message to be more specific. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org