GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/18179
[SPARK-20894][SS] Resolve the checkpoint location in driver and use the resolved path in state store (branch-2.2) ## What changes were proposed in this pull request? Backport #18149 to 2.2. ## How was this patch tested? Jenkins. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-20894-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18179.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18179 ---- commit a611c4776f9b195fcba6e23338d134071f28c87e Author: Shixiong Zhu <shixi...@databricks.com> Date: 2017-06-01T00:24:37Z [SPARK-20894][SS] Resolve the checkpoint location in driver and use the resolved path in state store When the user runs a Structured Streaming query in a cluster, if the driver uses the local file system, StateStore running in executors will throw a file-not-found exception. However, the current error is not obvious. This PR makes StreamExecution resolve the path in driver and uses the full path including the scheme part (such as `hdfs:/`, `file:/`) in StateStore. Then if the above error happens, StateStore will throw an error with this full path which starts with `file:/`, and it makes this error obvious: the checkpoint location is on the local file system. One potential minor issue is that the user cannot use different default file system settings in driver and executors (e.g., use a public HDFS address in driver and a private HDFS address in executors) after this change. However, since the batch query also has this issue (See https://github.com/apache/spark/blob/4bb6a53ebd06de3de97139a2dbc7c85fc3aa3e66/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L402), it doesn't make things worse. The new added test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18149 from zsxwing/SPARK-20894. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org