[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r59487598 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String) } None } + + private def createFileManager(): FileManager = { +val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +try { + new FileContextManager(metadataPath, hadoopConf) +} catch { + case e: UnsupportedFileSystemException => +logWarning("Could not use FileContext API for managing metadata log file. The log may be" + + "inconsistent under failures.", e) --- End diff -- Can we remove this stack trace? Its not helpful (the error is always thrown from `createFileSystem`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11925 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201619344 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201619350 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54230/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201618700 **[Test build #54230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54230/consoleFull)** for PR 11925 at commit [`d837c32`](https://github.com/apache/spark/commit/d837c3207c17000159f2ca5e77e93efe6b24705d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201611512 **[Test build #2696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2696/consoleFull)** for PR 11925 at commit [`d837c32`](https://github.com/apache/spark/commit/d837c3207c17000159f2ca5e77e93efe6b24705d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201589402 Alright I am merging this to master. Thanks @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201572014 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201572018 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54219/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201571487 **[Test build #54219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54219/consoleFull)** for PR 11925 at commit [`b9e57b6`](https://github.com/apache/spark/commit/b9e57b626cf2a5aa9e93cbb3db815f94b72355ee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201562753 **[Test build #54230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54230/consoleFull)** for PR 11925 at commit [`d837c32`](https://github.com/apache/spark/commit/d837c3207c17000159f2ca5e77e93efe6b24705d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201561829 **[Test build #2696 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2696/consoleFull)** for PR 11925 at commit [`d837c32`](https://github.com/apache/spark/commit/d837c3207c17000159f2ca5e77e93efe6b24705d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201525677 LGTM except some nits --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r57493848 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String) } None } + + private def createFileManager(): FileManager = { +val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +try { + new FileContextManager(metadataPath, hadoopConf) +} catch { + case e: UnsupportedFileSystemException => +logWarning("Could not use FileContext API for managing metadata log file. The log may be" + + "inconsistent under failures.", e) +new FileSystemManager(metadataPath, hadoopConf) +} + } +} + +object HDFSMetadataLog { + + /** A simple trait to abstract out the file management operations needed by HDFSMetadataLog. */ + trait FileManager { + +/** List the files in a path that matches a filter. */ +def list(path: Path, filter: PathFilter): Array[FileStatus] + +/** Make directory at the give path and all its parent directories as needed. */ +def mkdirs(path: Path): Unit + +/** Whether path exists */ +def exists(path: Path): Boolean + +/** Open a file for reading, or throw exception if it does not exist. */ +def open(path: Path): FSDataInputStream + +/** Create path, or throw exception if it already exists */ +def create(path: Path): FSDataOutputStream + +/** + * Atomically ename path, or throw exception if it cannot be done. + * Should throw FileAlreadyExistsException if file already exists. --- End diff -- nit `file` -> `srcPath` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r57493866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String) } None } + + private def createFileManager(): FileManager = { +val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +try { + new FileContextManager(metadataPath, hadoopConf) +} catch { + case e: UnsupportedFileSystemException => +logWarning("Could not use FileContext API for managing metadata log file. The log may be" + + "inconsistent under failures.", e) +new FileSystemManager(metadataPath, hadoopConf) +} + } +} + +object HDFSMetadataLog { + + /** A simple trait to abstract out the file management operations needed by HDFSMetadataLog. */ + trait FileManager { + +/** List the files in a path that matches a filter. */ +def list(path: Path, filter: PathFilter): Array[FileStatus] + +/** Make directory at the give path and all its parent directories as needed. */ +def mkdirs(path: Path): Unit + +/** Whether path exists */ +def exists(path: Path): Boolean + +/** Open a file for reading, or throw exception if it does not exist. */ +def open(path: Path): FSDataInputStream + +/** Create path, or throw exception if it already exists */ +def create(path: Path): FSDataOutputStream + +/** + * Atomically ename path, or throw exception if it cannot be done. + * Should throw FileAlreadyExistsException if file already exists. + * Should throw FileNotFound exception if the file does not exist. --- End diff -- nit: `file` -> `destPath` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r57493825 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String) } None } + + private def createFileManager(): FileManager = { +val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +try { + new FileContextManager(metadataPath, hadoopConf) +} catch { + case e: UnsupportedFileSystemException => +logWarning("Could not use FileContext API for managing metadata log file. The log may be" + + "inconsistent under failures.", e) +new FileSystemManager(metadataPath, hadoopConf) +} + } +} + +object HDFSMetadataLog { + + /** A simple trait to abstract out the file management operations needed by HDFSMetadataLog. */ + trait FileManager { + +/** List the files in a path that matches a filter. */ +def list(path: Path, filter: PathFilter): Array[FileStatus] + +/** Make directory at the give path and all its parent directories as needed. */ +def mkdirs(path: Path): Unit + +/** Whether path exists */ +def exists(path: Path): Boolean + +/** Open a file for reading, or throw exception if it does not exist. */ +def open(path: Path): FSDataInputStream + +/** Create path, or throw exception if it already exists */ +def create(path: Path): FSDataOutputStream + +/** + * Atomically ename path, or throw exception if it cannot be done. + * Should throw FileAlreadyExistsException if file already exists. + * Should throw FileNotFound exception if the file does not exist. --- End diff -- nit: `FileNotFound exception` -> FileNotFoundException --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r57493782 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String) } None } + + private def createFileManager(): FileManager = { +val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +try { + new FileContextManager(metadataPath, hadoopConf) +} catch { + case e: UnsupportedFileSystemException => +logWarning("Could not use FileContext API for managing metadata log file. The log may be" + + "inconsistent under failures.", e) +new FileSystemManager(metadataPath, hadoopConf) +} + } +} + +object HDFSMetadataLog { + + /** A simple trait to abstract out the file management operations needed by HDFSMetadataLog. */ + trait FileManager { + +/** List the files in a path that matches a filter. */ +def list(path: Path, filter: PathFilter): Array[FileStatus] + +/** Make directory at the give path and all its parent directories as needed. */ +def mkdirs(path: Path): Unit + +/** Whether path exists */ +def exists(path: Path): Boolean + +/** Open a file for reading, or throw exception if it does not exist. */ +def open(path: Path): FSDataInputStream + +/** Create path, or throw exception if it already exists */ +def create(path: Path): FSDataOutputStream + +/** + * Atomically ename path, or throw exception if it cannot be done. --- End diff -- nit: ename -> rename --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201520324 **[Test build #54219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54219/consoleFull)** for PR 11925 at commit [`b9e57b6`](https://github.com/apache/spark/commit/b9e57b626cf2a5aa9e93cbb3db815f94b72355ee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201519398 The last test failures were due to the changes in SharedSQLContext and TestSQLContext. The last commit should fix that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201439988 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201439993 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54194/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201439602 **[Test build #54194 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54194/consoleFull)** for PR 11925 at commit [`5eb63b6`](https://github.com/apache/spark/commit/5eb63b67e86275e7c65a04067b233c85483f3af1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201393248 **[Test build #54194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54194/consoleFull)** for PR 11925 at commit [`5eb63b6`](https://github.com/apache/spark/commit/5eb63b67e86275e7c65a04067b233c85483f3af1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201087805 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201087807 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54127/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201087799 **[Test build #54127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54127/consoleFull)** for PR 11925 at commit [`b926680`](https://github.com/apache/spark/commit/b926680f18d1e75518ca9237f6cf6dbefd813209). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201087615 **[Test build #54127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54127/consoleFull)** for PR 11925 at commit [`b926680`](https://github.com/apache/spark/commit/b926680f18d1e75518ca9237f6cf6dbefd813209). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r57406100 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String) } None } + + private def createFileManager(): FileManager = { +val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +try { + new FileContextManager(metadataPath, hadoopConf) +} catch { + case e: UnsupportedFileSystemException => +logWarning("Could not use FileContext API for managing metadata log file. The log may be" + + "inconsistent under failures.", e) +new FileSystemManager(metadataPath, hadoopConf) +} + } +} + +object HDFSMetadataLog { + + /** A simple trait to abstract out the file management operations needed by HDFSMetadataLog. */ + trait FileManager { + +/** List the files in a path that matches a filter. */ +def list(path: Path, filter: PathFilter): Array[FileStatus] + +/** Make directory at the give path and all its parent directories as needed. */ +def mkdirs(path: Path): Unit + +/** Whether path exists */ +def exists(path: Path): Boolean + +/** Open a file for reading, or throw exception if it does not exist. */ +def open(path: Path): FSDataInputStream + +/** Create path, or throw exception if it already exists */ +def create(path: Path): FSDataOutputStream + +/** + * Atomically ename path, or throw exception if it cannot be done. + * Should throw FileAlreadyExistsException if file already exists. + * Should throw FileNotFound exception if the file does not exist. + */ +def rename(srcPath: Path, destPath: Path): Unit + +/** Recursively delete a path if it exists. Should not throw exception if file doesn't exist. */ +def delete(path: Path): Unit + } + + /** + * Default implementation of FileManager using newer FileContext API. + */ + class FileContextManager(path: Path, hadoopConf: Configuration) extends FileManager { +private val fc = if (path.toUri.getScheme == null) { + FileContext.getFileContext(hadoopConf) +} else { + FileContext.getFileContext(path.toUri, hadoopConf) +} + +override def list(path: Path, filter: PathFilter): Array[FileStatus] = { + fc.util.listStatus(path, filter) +} + +override def rename(srcPath: Path, destPath: Path): Unit = { + fc.rename(srcPath, destPath) +} + +override def mkdirs(path: Path): Unit = { + fc.mkdir(path, FsPermission.getDirDefault, true) +} + +override def open(path: Path): FSDataInputStream = { + fc.open(path) +} + +override def create(path: Path): FSDataOutputStream = { + fc.create(path, EnumSet.of(CreateFlag.CREATE)) +} + +override def exists(path: Path): Boolean = { + fc.util().exists(path) +} + +override def delete(path: Path): Unit = { + try { +fc.delete(path, true) + } catch { +case e: FileNotFoundException => +// ignore if file has already been deleted + } +} + } + + /** + * Implementation of FileManager using older FileSystem API. Note that this implementation + * cannot provide atomic renaming of paths, hence can lead to consistency issues. This + * should be used only as a backup option, when FileContextManager cannot be used. + */ + class FileSystemManager(path: Path, hadoopConf: Configuration) extends FileManager { +private val fs = path.getFileSystem(hadoopConf) + +override def list(path: Path, filter: PathFilter): Array[FileStatus] = { + fs.listStatus(path, filter) +} + +/** + * Rename a path. Note that this implementation is not atomic. + * @throws FileNotFoundException if source path does not exist. + * @throws FileAlreadyExistsException if destination path already exists. + * @throws IOException if renaming fails for some unknown reason. + */ +override def rename(srcPath: Path, destPath: Path): Unit = { + if (!fs.exists(srcPath)) { +throw new FileNotFoundException(s"Source path already exists: $srcPath") --- End diff -- nit: Source path already exists -> Cannot find ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is e
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201066015 **[Test build #54109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54109/consoleFull)** for PR 11925 at commit [`08f96a8`](https://github.com/apache/spark/commit/08f96a8aa0b3c12404eb7ef31128744fcf57b1e0). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201066025 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54109/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201066020 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-201065768 **[Test build #54109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54109/consoleFull)** for PR 11925 at commit [`08f96a8`](https://github.com/apache/spark/commit/08f96a8aa0b3c12404eb7ef31128744fcf57b1e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r57381927 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +194,107 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String) } None } + + private def createFileManager(): FileManager = { +val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +try { + new FileContextManager(metadataPath, hadoopConf) +} catch { + case e: UnsupportedFileSystemException => +logWarning("Could not use FileContext API for managing metadata log file. The log may be" + + "inconsistent under failures.", e) +new FileSystemManager(metadataPath, hadoopConf) +} + } +} + +object HDFSMetadataLog { + + /** A simple trait to abstract out the file management operations needed by HDFSMetadataLog */ + trait FileManager { +def list(path: Path, filter: PathFilter): Array[FileStatus] +def mkdirs(path: Path): Unit +def exists(path: Path): Boolean +def open(path: Path): FSDataInputStream +def create(path: Path): FSDataOutputStream +def rename(srcPath: Path, destPath: Path): Unit +def deleteOnExit(path: Path): Unit + } + + /** Implementation of FileManager using newer FileContext API */ + class FileContextManager(path: Path, hadoopConf: Configuration) extends FileManager { +private val fc = if (path.toUri.getScheme == null) { + FileContext.getFileContext(hadoopConf) +} else { + FileContext.getFileContext(path.toUri, hadoopConf) +} + +override def list(path: Path, filter: PathFilter): Array[FileStatus] = { + fc.util.listStatus(path, filter) +} + +override def rename(srcPath: Path, destPath: Path): Unit = { + fc.rename(srcPath, destPath) +} + +override def mkdirs(path: Path): Unit = { + fc.mkdir(path, FsPermission.getDirDefault, true) +} + +override def open(path: Path): FSDataInputStream = { + fc.open(path) +} + +override def create(path: Path): FSDataOutputStream = { + fc.create(path, EnumSet.of(CreateFlag.CREATE)) +} + +override def exists(path: Path): Boolean = { + fc.util().exists(path) +} + +override def deleteOnExit(path: Path): Unit = { + fc.deleteOnExit(path) +} + } + + /** Implementation of FileManager using older FileSystem API */ + class FileSystemManager(path: Path, hadoopConf: Configuration) extends FileManager { +private val fs = if (path.toUri.getScheme == null) { + FileSystem.get(hadoopConf) +} else { + FileSystem.get(path.toUri, hadoopConf) +} + +override def list(path: Path, filter: PathFilter): Array[FileStatus] = { + fs.listStatus(path, filter) +} + +override def rename(srcPath: Path, destPath: Path): Unit = { + if (fs.exists(destPath)) { +throw new FileAlreadyExistsException(s"File already exists: $destPath") + } + fs.rename(srcPath, destPath) --- End diff -- Yeah, that may be better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200973256 Looks pretty good except the `fs.rename` issue and some nits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r57369347 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala --- @@ -17,16 +17,28 @@ package org.apache.spark.sql.execution.streaming +import java.net.URI import java.util.ConcurrentModificationException +import scala.util.Random + +import org.apache.hadoop.fs._ import org.scalatest.concurrent.AsyncAssertions._ import org.scalatest.time.SpanSugar._ -import org.apache.spark.SparkFunSuite +import org.apache.spark.{SparkConf, SparkFunSuite} +import org.apache.spark.sql.execution.streaming.FakeFileSystem._ import org.apache.spark.sql.test.SharedSQLContext class HDFSMetadataLogSuite extends SparkFunSuite with SharedSQLContext { + --- End diff -- nit: extra line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r57367862 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +194,107 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String) } None } + + private def createFileManager(): FileManager = { +val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +try { + new FileContextManager(metadataPath, hadoopConf) +} catch { + case e: UnsupportedFileSystemException => +logWarning("Could not use FileContext API for managing metadata log file. The log may be" + + "inconsistent under failures.", e) +new FileSystemManager(metadataPath, hadoopConf) +} + } +} + +object HDFSMetadataLog { + + /** A simple trait to abstract out the file management operations needed by HDFSMetadataLog */ + trait FileManager { +def list(path: Path, filter: PathFilter): Array[FileStatus] +def mkdirs(path: Path): Unit +def exists(path: Path): Boolean +def open(path: Path): FSDataInputStream +def create(path: Path): FSDataOutputStream +def rename(srcPath: Path, destPath: Path): Unit +def deleteOnExit(path: Path): Unit + } + + /** Implementation of FileManager using newer FileContext API */ + class FileContextManager(path: Path, hadoopConf: Configuration) extends FileManager { +private val fc = if (path.toUri.getScheme == null) { + FileContext.getFileContext(hadoopConf) +} else { + FileContext.getFileContext(path.toUri, hadoopConf) +} + +override def list(path: Path, filter: PathFilter): Array[FileStatus] = { + fc.util.listStatus(path, filter) +} + +override def rename(srcPath: Path, destPath: Path): Unit = { + fc.rename(srcPath, destPath) +} + +override def mkdirs(path: Path): Unit = { + fc.mkdir(path, FsPermission.getDirDefault, true) +} + +override def open(path: Path): FSDataInputStream = { + fc.open(path) +} + +override def create(path: Path): FSDataOutputStream = { + fc.create(path, EnumSet.of(CreateFlag.CREATE)) +} + +override def exists(path: Path): Boolean = { + fc.util().exists(path) +} + +override def deleteOnExit(path: Path): Unit = { + fc.deleteOnExit(path) +} + } + + /** Implementation of FileManager using older FileSystem API */ + class FileSystemManager(path: Path, hadoopConf: Configuration) extends FileManager { +private val fs = if (path.toUri.getScheme == null) { + FileSystem.get(hadoopConf) +} else { + FileSystem.get(path.toUri, hadoopConf) +} + +override def list(path: Path, filter: PathFilter): Array[FileStatus] = { + fs.listStatus(path, filter) +} + +override def rename(srcPath: Path, destPath: Path): Unit = { + if (fs.exists(destPath)) { +throw new FileAlreadyExistsException(s"File already exists: $destPath") + } + fs.rename(srcPath, destPath) --- End diff -- Unlike `fc.rename`, `fs.rename` will return `false` if it fails. We should throw an exception for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/11925#discussion_r57367309 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -196,4 +194,107 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String) } None } + + private def createFileManager(): FileManager = { +val hadoopConf = sqlContext.sparkContext.hadoopConfiguration +try { + new FileContextManager(metadataPath, hadoopConf) +} catch { + case e: UnsupportedFileSystemException => +logWarning("Could not use FileContext API for managing metadata log file. The log may be" + + "inconsistent under failures.", e) +new FileSystemManager(metadataPath, hadoopConf) +} + } +} + +object HDFSMetadataLog { + + /** A simple trait to abstract out the file management operations needed by HDFSMetadataLog */ + trait FileManager { +def list(path: Path, filter: PathFilter): Array[FileStatus] +def mkdirs(path: Path): Unit +def exists(path: Path): Boolean +def open(path: Path): FSDataInputStream +def create(path: Path): FSDataOutputStream +def rename(srcPath: Path, destPath: Path): Unit +def deleteOnExit(path: Path): Unit + } + + /** Implementation of FileManager using newer FileContext API */ + class FileContextManager(path: Path, hadoopConf: Configuration) extends FileManager { +private val fc = if (path.toUri.getScheme == null) { + FileContext.getFileContext(hadoopConf) +} else { + FileContext.getFileContext(path.toUri, hadoopConf) +} + +override def list(path: Path, filter: PathFilter): Array[FileStatus] = { + fc.util.listStatus(path, filter) +} + +override def rename(srcPath: Path, destPath: Path): Unit = { + fc.rename(srcPath, destPath) +} + +override def mkdirs(path: Path): Unit = { + fc.mkdir(path, FsPermission.getDirDefault, true) +} + +override def open(path: Path): FSDataInputStream = { + fc.open(path) +} + +override def create(path: Path): FSDataOutputStream = { + fc.create(path, EnumSet.of(CreateFlag.CREATE)) +} + +override def exists(path: Path): Boolean = { + fc.util().exists(path) +} + +override def deleteOnExit(path: Path): Unit = { + fc.deleteOnExit(path) +} + } + + /** Implementation of FileManager using older FileSystem API */ + class FileSystemManager(path: Path, hadoopConf: Configuration) extends FileManager { +private val fs = if (path.toUri.getScheme == null) { --- End diff -- You can just use `path.getFileSystem(hadoopConf)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200962816 test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200701222 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54006/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200701133 **[Test build #54006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54006/consoleFull)** for PR 11925 at commit [`dcf8096`](https://github.com/apache/spark/commit/dcf80967b6f85f30d34db5730ead611b83912f53). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200682795 **[Test build #54006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54006/consoleFull)** for PR 11925 at commit [`dcf8096`](https://github.com/apache/spark/commit/dcf80967b6f85f30d34db5730ead611b83912f53). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200612231 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53991/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200612221 **[Test build #53991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53991/consoleFull)** for PR 11925 at commit [`c490a25`](https://github.com/apache/spark/commit/c490a25a3f0606477575e9832c7100bc675081a3). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200612229 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200611865 **[Test build #53991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53991/consoleFull)** for PR 11925 at commit [`c490a25`](https://github.com/apache/spark/commit/c490a25a3f0606477575e9832c7100bc675081a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200610889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53990/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200610876 **[Test build #53990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53990/consoleFull)** for PR 11925 at commit [`f3dade7`](https://github.com/apache/spark/commit/f3dade7127a7ccde96c002abc6ad75dac5337d6d). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` trait FileManager ` * ` class FileContextManager(path: Path, hadoopConf: Configuration) extends FileManager ` * ` class FileSystemManager(path: Path, hadoopConf: Configuration) extends FileManager ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200610470 @marmbrus @zsxwing @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200610884 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11925#issuecomment-200610164 **[Test build #53990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53990/consoleFull)** for PR 11925 at commit [`f3dade7`](https://github.com/apache/spark/commit/f3dade7127a7ccde96c002abc6ad75dac5337d6d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/11925 [SPARK-14109][SQL] Fix HDFSMetadataLog to fallback to FileSystem ## What changes were proposed in this pull request? HDFSMetadataLog uses newer FileContext API to achieve atomic renaming. However, FileContext implementations may not exist for many scheme for which there may be FileSystem implementations. In those cases, rather than failing completely, we should fallback to the FileSystem based implementation, and log warning that there may be file consistency issues in case the log directory is concurrently modified. ## How was this patch tested? Unit test. Cluster test pending. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-14109 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11925.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11925 commit f3dade7127a7ccde96c002abc6ad75dac5337d6d Author: Tathagata Das Date: 2016-03-24T01:40:00Z Fix HDFSMetadataLog to fallback to FileSystem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org