[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-04-12 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r59487598
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: 
SQLContext, path: String)
 }
 None
   }
+
+  private def createFileManager(): FileManager = {
+val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+try {
+  new FileContextManager(metadataPath, hadoopConf)
+} catch {
+  case e: UnsupportedFileSystemException =>
+logWarning("Could not use FileContext API for managing metadata 
log file. The log may be" +
+  "inconsistent under failures.", e)
--- End diff --

Can we remove this stack trace?  Its not helpful (the error is always 
thrown from `createFileSystem`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11925


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201619344
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201619350
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54230/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201618700
  
**[Test build #54230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54230/consoleFull)**
 for PR 11925 at commit 
[`d837c32`](https://github.com/apache/spark/commit/d837c3207c17000159f2ca5e77e93efe6b24705d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201611512
  
**[Test build #2696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2696/consoleFull)**
 for PR 11925 at commit 
[`d837c32`](https://github.com/apache/spark/commit/d837c3207c17000159f2ca5e77e93efe6b24705d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201589402
  
Alright I am merging this to master. Thanks @zsxwing 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201572014
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201572018
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54219/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201571487
  
**[Test build #54219 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54219/consoleFull)**
 for PR 11925 at commit 
[`b9e57b6`](https://github.com/apache/spark/commit/b9e57b626cf2a5aa9e93cbb3db815f94b72355ee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201562753
  
**[Test build #54230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54230/consoleFull)**
 for PR 11925 at commit 
[`d837c32`](https://github.com/apache/spark/commit/d837c3207c17000159f2ca5e77e93efe6b24705d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201561829
  
**[Test build #2696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2696/consoleFull)**
 for PR 11925 at commit 
[`d837c32`](https://github.com/apache/spark/commit/d837c3207c17000159f2ca5e77e93efe6b24705d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201525677
  
LGTM except some nits


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r57493848
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: 
SQLContext, path: String)
 }
 None
   }
+
+  private def createFileManager(): FileManager = {
+val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+try {
+  new FileContextManager(metadataPath, hadoopConf)
+} catch {
+  case e: UnsupportedFileSystemException =>
+logWarning("Could not use FileContext API for managing metadata 
log file. The log may be" +
+  "inconsistent under failures.", e)
+new FileSystemManager(metadataPath, hadoopConf)
+}
+  }
+}
+
+object HDFSMetadataLog {
+
+  /** A simple trait to abstract out the file management operations needed 
by HDFSMetadataLog. */
+  trait FileManager {
+
+/** List the files in a path that matches a filter. */
+def list(path: Path, filter: PathFilter): Array[FileStatus]
+
+/** Make directory at the give path and all its parent directories as 
needed. */
+def mkdirs(path: Path): Unit
+
+/** Whether path exists */
+def exists(path: Path): Boolean
+
+/** Open a file for reading, or throw exception if it does not exist. 
*/
+def open(path: Path): FSDataInputStream
+
+/** Create path, or throw exception if it already exists */
+def create(path: Path): FSDataOutputStream
+
+/**
+ * Atomically ename path, or throw exception if it cannot be done.
+ * Should throw FileAlreadyExistsException if file already exists.
--- End diff --

nit `file` -> `srcPath`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r57493866
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: 
SQLContext, path: String)
 }
 None
   }
+
+  private def createFileManager(): FileManager = {
+val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+try {
+  new FileContextManager(metadataPath, hadoopConf)
+} catch {
+  case e: UnsupportedFileSystemException =>
+logWarning("Could not use FileContext API for managing metadata 
log file. The log may be" +
+  "inconsistent under failures.", e)
+new FileSystemManager(metadataPath, hadoopConf)
+}
+  }
+}
+
+object HDFSMetadataLog {
+
+  /** A simple trait to abstract out the file management operations needed 
by HDFSMetadataLog. */
+  trait FileManager {
+
+/** List the files in a path that matches a filter. */
+def list(path: Path, filter: PathFilter): Array[FileStatus]
+
+/** Make directory at the give path and all its parent directories as 
needed. */
+def mkdirs(path: Path): Unit
+
+/** Whether path exists */
+def exists(path: Path): Boolean
+
+/** Open a file for reading, or throw exception if it does not exist. 
*/
+def open(path: Path): FSDataInputStream
+
+/** Create path, or throw exception if it already exists */
+def create(path: Path): FSDataOutputStream
+
+/**
+ * Atomically ename path, or throw exception if it cannot be done.
+ * Should throw FileAlreadyExistsException if file already exists.
+ * Should throw FileNotFound exception if the file does not exist.
--- End diff --

nit: `file` -> `destPath`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r57493825
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: 
SQLContext, path: String)
 }
 None
   }
+
+  private def createFileManager(): FileManager = {
+val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+try {
+  new FileContextManager(metadataPath, hadoopConf)
+} catch {
+  case e: UnsupportedFileSystemException =>
+logWarning("Could not use FileContext API for managing metadata 
log file. The log may be" +
+  "inconsistent under failures.", e)
+new FileSystemManager(metadataPath, hadoopConf)
+}
+  }
+}
+
+object HDFSMetadataLog {
+
+  /** A simple trait to abstract out the file management operations needed 
by HDFSMetadataLog. */
+  trait FileManager {
+
+/** List the files in a path that matches a filter. */
+def list(path: Path, filter: PathFilter): Array[FileStatus]
+
+/** Make directory at the give path and all its parent directories as 
needed. */
+def mkdirs(path: Path): Unit
+
+/** Whether path exists */
+def exists(path: Path): Boolean
+
+/** Open a file for reading, or throw exception if it does not exist. 
*/
+def open(path: Path): FSDataInputStream
+
+/** Create path, or throw exception if it already exists */
+def create(path: Path): FSDataOutputStream
+
+/**
+ * Atomically ename path, or throw exception if it cannot be done.
+ * Should throw FileAlreadyExistsException if file already exists.
+ * Should throw FileNotFound exception if the file does not exist.
--- End diff --

nit: `FileNotFound exception` -> FileNotFoundException


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r57493782
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: 
SQLContext, path: String)
 }
 None
   }
+
+  private def createFileManager(): FileManager = {
+val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+try {
+  new FileContextManager(metadataPath, hadoopConf)
+} catch {
+  case e: UnsupportedFileSystemException =>
+logWarning("Could not use FileContext API for managing metadata 
log file. The log may be" +
+  "inconsistent under failures.", e)
+new FileSystemManager(metadataPath, hadoopConf)
+}
+  }
+}
+
+object HDFSMetadataLog {
+
+  /** A simple trait to abstract out the file management operations needed 
by HDFSMetadataLog. */
+  trait FileManager {
+
+/** List the files in a path that matches a filter. */
+def list(path: Path, filter: PathFilter): Array[FileStatus]
+
+/** Make directory at the give path and all its parent directories as 
needed. */
+def mkdirs(path: Path): Unit
+
+/** Whether path exists */
+def exists(path: Path): Boolean
+
+/** Open a file for reading, or throw exception if it does not exist. 
*/
+def open(path: Path): FSDataInputStream
+
+/** Create path, or throw exception if it already exists */
+def create(path: Path): FSDataOutputStream
+
+/**
+ * Atomically ename path, or throw exception if it cannot be done.
--- End diff --

nit: ename -> rename


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201520324
  
**[Test build #54219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54219/consoleFull)**
 for PR 11925 at commit 
[`b9e57b6`](https://github.com/apache/spark/commit/b9e57b626cf2a5aa9e93cbb3db815f94b72355ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201519398
  
The last test failures were due to the changes in SharedSQLContext and 
TestSQLContext. The last commit should fix that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201439988
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201439993
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54194/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201439602
  
**[Test build #54194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54194/consoleFull)**
 for PR 11925 at commit 
[`5eb63b6`](https://github.com/apache/spark/commit/5eb63b67e86275e7c65a04067b233c85483f3af1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201393248
  
**[Test build #54194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54194/consoleFull)**
 for PR 11925 at commit 
[`5eb63b6`](https://github.com/apache/spark/commit/5eb63b67e86275e7c65a04067b233c85483f3af1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201087805
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201087807
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54127/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201087799
  
**[Test build #54127 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54127/consoleFull)**
 for PR 11925 at commit 
[`b926680`](https://github.com/apache/spark/commit/b926680f18d1e75518ca9237f6cf6dbefd813209).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201087615
  
**[Test build #54127 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54127/consoleFull)**
 for PR 11925 at commit 
[`b926680`](https://github.com/apache/spark/commit/b926680f18d1e75518ca9237f6cf6dbefd813209).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r57406100
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -196,4 +195,148 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: 
SQLContext, path: String)
 }
 None
   }
+
+  private def createFileManager(): FileManager = {
+val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+try {
+  new FileContextManager(metadataPath, hadoopConf)
+} catch {
+  case e: UnsupportedFileSystemException =>
+logWarning("Could not use FileContext API for managing metadata 
log file. The log may be" +
+  "inconsistent under failures.", e)
+new FileSystemManager(metadataPath, hadoopConf)
+}
+  }
+}
+
+object HDFSMetadataLog {
+
+  /** A simple trait to abstract out the file management operations needed 
by HDFSMetadataLog. */
+  trait FileManager {
+
+/** List the files in a path that matches a filter. */
+def list(path: Path, filter: PathFilter): Array[FileStatus]
+
+/** Make directory at the give path and all its parent directories as 
needed. */
+def mkdirs(path: Path): Unit
+
+/** Whether path exists */
+def exists(path: Path): Boolean
+
+/** Open a file for reading, or throw exception if it does not exist. 
*/
+def open(path: Path): FSDataInputStream
+
+/** Create path, or throw exception if it already exists */
+def create(path: Path): FSDataOutputStream
+
+/**
+ * Atomically ename path, or throw exception if it cannot be done.
+ * Should throw FileAlreadyExistsException if file already exists.
+ * Should throw FileNotFound exception if the file does not exist.
+ */
+def rename(srcPath: Path, destPath: Path): Unit
+
+/** Recursively delete a path if it exists. Should not throw exception 
if file doesn't exist. */
+def delete(path: Path): Unit
+  }
+
+  /**
+   * Default implementation of FileManager using newer FileContext API.
+   */
+  class FileContextManager(path: Path, hadoopConf: Configuration) extends 
FileManager {
+private val fc = if (path.toUri.getScheme == null) {
+  FileContext.getFileContext(hadoopConf)
+} else {
+  FileContext.getFileContext(path.toUri, hadoopConf)
+}
+
+override def list(path: Path, filter: PathFilter): Array[FileStatus] = 
{
+  fc.util.listStatus(path, filter)
+}
+
+override def rename(srcPath: Path, destPath: Path): Unit = {
+  fc.rename(srcPath, destPath)
+}
+
+override def mkdirs(path: Path): Unit = {
+  fc.mkdir(path, FsPermission.getDirDefault, true)
+}
+
+override def open(path: Path): FSDataInputStream = {
+  fc.open(path)
+}
+
+override def create(path: Path): FSDataOutputStream = {
+  fc.create(path, EnumSet.of(CreateFlag.CREATE))
+}
+
+override def exists(path: Path): Boolean = {
+  fc.util().exists(path)
+}
+
+override def delete(path: Path): Unit = {
+  try {
+fc.delete(path, true)
+  } catch {
+case e: FileNotFoundException =>
+// ignore if file has already been deleted
+  }
+}
+  }
+
+  /**
+   * Implementation of FileManager using older FileSystem API. Note that 
this implementation
+   * cannot provide atomic renaming of paths, hence can lead to 
consistency issues. This
+   * should be used only as a backup option, when FileContextManager 
cannot be used.
+   */
+  class FileSystemManager(path: Path, hadoopConf: Configuration) extends 
FileManager {
+private val fs = path.getFileSystem(hadoopConf)
+
+override def list(path: Path, filter: PathFilter): Array[FileStatus] = 
{
+  fs.listStatus(path, filter)
+}
+
+/**
+ * Rename a path. Note that this implementation is not atomic.
+ * @throws FileNotFoundException if source path does not exist.
+ * @throws FileAlreadyExistsException if destination path already 
exists.
+ * @throws IOException if renaming fails for some unknown reason.
+ */
+override def rename(srcPath: Path, destPath: Path): Unit = {
+  if (!fs.exists(srcPath)) {
+throw new FileNotFoundException(s"Source path already exists: 
$srcPath")
--- End diff --

nit: Source path already exists -> Cannot find ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is e

[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201066015
  
**[Test build #54109 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54109/consoleFull)**
 for PR 11925 at commit 
[`08f96a8`](https://github.com/apache/spark/commit/08f96a8aa0b3c12404eb7ef31128744fcf57b1e0).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201066025
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54109/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201066020
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-201065768
  
**[Test build #54109 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54109/consoleFull)**
 for PR 11925 at commit 
[`08f96a8`](https://github.com/apache/spark/commit/08f96a8aa0b3c12404eb7ef31128744fcf57b1e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r57381927
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -196,4 +194,107 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: 
SQLContext, path: String)
 }
 None
   }
+
+  private def createFileManager(): FileManager = {
+val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+try {
+  new FileContextManager(metadataPath, hadoopConf)
+} catch {
+  case e: UnsupportedFileSystemException =>
+logWarning("Could not use FileContext API for managing metadata 
log file. The log may be" +
+  "inconsistent under failures.", e)
+new FileSystemManager(metadataPath, hadoopConf)
+}
+  }
+}
+
+object HDFSMetadataLog {
+
+  /** A simple trait to abstract out the file management operations needed 
by HDFSMetadataLog */
+  trait FileManager {
+def list(path: Path, filter: PathFilter): Array[FileStatus]
+def mkdirs(path: Path): Unit
+def exists(path: Path): Boolean
+def open(path: Path): FSDataInputStream
+def create(path: Path): FSDataOutputStream
+def rename(srcPath: Path, destPath: Path): Unit
+def deleteOnExit(path: Path): Unit
+  }
+
+  /** Implementation of FileManager using newer FileContext API */
+  class FileContextManager(path: Path, hadoopConf: Configuration) extends 
FileManager {
+private val fc = if (path.toUri.getScheme == null) {
+  FileContext.getFileContext(hadoopConf)
+} else {
+  FileContext.getFileContext(path.toUri, hadoopConf)
+}
+
+override def list(path: Path, filter: PathFilter): Array[FileStatus] = 
{
+  fc.util.listStatus(path, filter)
+}
+
+override def rename(srcPath: Path, destPath: Path): Unit = {
+  fc.rename(srcPath, destPath)
+}
+
+override def mkdirs(path: Path): Unit = {
+  fc.mkdir(path, FsPermission.getDirDefault, true)
+}
+
+override def open(path: Path): FSDataInputStream = {
+  fc.open(path)
+}
+
+override def create(path: Path): FSDataOutputStream = {
+  fc.create(path, EnumSet.of(CreateFlag.CREATE))
+}
+
+override def exists(path: Path): Boolean = {
+  fc.util().exists(path)
+}
+
+override def deleteOnExit(path: Path): Unit = {
+  fc.deleteOnExit(path)
+}
+  }
+
+  /** Implementation of FileManager using older FileSystem API */
+  class FileSystemManager(path: Path, hadoopConf: Configuration) extends 
FileManager {
+private val fs = if (path.toUri.getScheme == null) {
+  FileSystem.get(hadoopConf)
+} else {
+  FileSystem.get(path.toUri, hadoopConf)
+}
+
+override def list(path: Path, filter: PathFilter): Array[FileStatus] = 
{
+  fs.listStatus(path, filter)
+}
+
+override def rename(srcPath: Path, destPath: Path): Unit = {
+  if (fs.exists(destPath)) {
+throw new FileAlreadyExistsException(s"File already exists: 
$destPath")
+  }
+  fs.rename(srcPath, destPath)
--- End diff --

Yeah, that may be better. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200973256
  
Looks pretty good except the `fs.rename` issue and some nits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r57369347
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala
 ---
@@ -17,16 +17,28 @@
 
 package org.apache.spark.sql.execution.streaming
 
+import java.net.URI
 import java.util.ConcurrentModificationException
 
+import scala.util.Random
+
+import org.apache.hadoop.fs._
 import org.scalatest.concurrent.AsyncAssertions._
 import org.scalatest.time.SpanSugar._
 
-import org.apache.spark.SparkFunSuite
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.sql.execution.streaming.FakeFileSystem._
 import org.apache.spark.sql.test.SharedSQLContext
 
 class HDFSMetadataLogSuite extends SparkFunSuite with SharedSQLContext {
 
+
--- End diff --

nit: extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r57367862
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -196,4 +194,107 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: 
SQLContext, path: String)
 }
 None
   }
+
+  private def createFileManager(): FileManager = {
+val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+try {
+  new FileContextManager(metadataPath, hadoopConf)
+} catch {
+  case e: UnsupportedFileSystemException =>
+logWarning("Could not use FileContext API for managing metadata 
log file. The log may be" +
+  "inconsistent under failures.", e)
+new FileSystemManager(metadataPath, hadoopConf)
+}
+  }
+}
+
+object HDFSMetadataLog {
+
+  /** A simple trait to abstract out the file management operations needed 
by HDFSMetadataLog */
+  trait FileManager {
+def list(path: Path, filter: PathFilter): Array[FileStatus]
+def mkdirs(path: Path): Unit
+def exists(path: Path): Boolean
+def open(path: Path): FSDataInputStream
+def create(path: Path): FSDataOutputStream
+def rename(srcPath: Path, destPath: Path): Unit
+def deleteOnExit(path: Path): Unit
+  }
+
+  /** Implementation of FileManager using newer FileContext API */
+  class FileContextManager(path: Path, hadoopConf: Configuration) extends 
FileManager {
+private val fc = if (path.toUri.getScheme == null) {
+  FileContext.getFileContext(hadoopConf)
+} else {
+  FileContext.getFileContext(path.toUri, hadoopConf)
+}
+
+override def list(path: Path, filter: PathFilter): Array[FileStatus] = 
{
+  fc.util.listStatus(path, filter)
+}
+
+override def rename(srcPath: Path, destPath: Path): Unit = {
+  fc.rename(srcPath, destPath)
+}
+
+override def mkdirs(path: Path): Unit = {
+  fc.mkdir(path, FsPermission.getDirDefault, true)
+}
+
+override def open(path: Path): FSDataInputStream = {
+  fc.open(path)
+}
+
+override def create(path: Path): FSDataOutputStream = {
+  fc.create(path, EnumSet.of(CreateFlag.CREATE))
+}
+
+override def exists(path: Path): Boolean = {
+  fc.util().exists(path)
+}
+
+override def deleteOnExit(path: Path): Unit = {
+  fc.deleteOnExit(path)
+}
+  }
+
+  /** Implementation of FileManager using older FileSystem API */
+  class FileSystemManager(path: Path, hadoopConf: Configuration) extends 
FileManager {
+private val fs = if (path.toUri.getScheme == null) {
+  FileSystem.get(hadoopConf)
+} else {
+  FileSystem.get(path.toUri, hadoopConf)
+}
+
+override def list(path: Path, filter: PathFilter): Array[FileStatus] = 
{
+  fs.listStatus(path, filter)
+}
+
+override def rename(srcPath: Path, destPath: Path): Unit = {
+  if (fs.exists(destPath)) {
+throw new FileAlreadyExistsException(s"File already exists: 
$destPath")
+  }
+  fs.rename(srcPath, destPath)
--- End diff --

Unlike `fc.rename`, `fs.rename` will return `false` if it fails. We should 
throw an exception for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11925#discussion_r57367309
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -196,4 +194,107 @@ class HDFSMetadataLog[T: ClassTag](sqlContext: 
SQLContext, path: String)
 }
 None
   }
+
+  private def createFileManager(): FileManager = {
+val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
+try {
+  new FileContextManager(metadataPath, hadoopConf)
+} catch {
+  case e: UnsupportedFileSystemException =>
+logWarning("Could not use FileContext API for managing metadata 
log file. The log may be" +
+  "inconsistent under failures.", e)
+new FileSystemManager(metadataPath, hadoopConf)
+}
+  }
+}
+
+object HDFSMetadataLog {
+
+  /** A simple trait to abstract out the file management operations needed 
by HDFSMetadataLog */
+  trait FileManager {
+def list(path: Path, filter: PathFilter): Array[FileStatus]
+def mkdirs(path: Path): Unit
+def exists(path: Path): Boolean
+def open(path: Path): FSDataInputStream
+def create(path: Path): FSDataOutputStream
+def rename(srcPath: Path, destPath: Path): Unit
+def deleteOnExit(path: Path): Unit
+  }
+
+  /** Implementation of FileManager using newer FileContext API */
+  class FileContextManager(path: Path, hadoopConf: Configuration) extends 
FileManager {
+private val fc = if (path.toUri.getScheme == null) {
+  FileContext.getFileContext(hadoopConf)
+} else {
+  FileContext.getFileContext(path.toUri, hadoopConf)
+}
+
+override def list(path: Path, filter: PathFilter): Array[FileStatus] = 
{
+  fc.util.listStatus(path, filter)
+}
+
+override def rename(srcPath: Path, destPath: Path): Unit = {
+  fc.rename(srcPath, destPath)
+}
+
+override def mkdirs(path: Path): Unit = {
+  fc.mkdir(path, FsPermission.getDirDefault, true)
+}
+
+override def open(path: Path): FSDataInputStream = {
+  fc.open(path)
+}
+
+override def create(path: Path): FSDataOutputStream = {
+  fc.create(path, EnumSet.of(CreateFlag.CREATE))
+}
+
+override def exists(path: Path): Boolean = {
+  fc.util().exists(path)
+}
+
+override def deleteOnExit(path: Path): Unit = {
+  fc.deleteOnExit(path)
+}
+  }
+
+  /** Implementation of FileManager using older FileSystem API */
+  class FileSystemManager(path: Path, hadoopConf: Configuration) extends 
FileManager {
+private val fs = if (path.toUri.getScheme == null) {
--- End diff --

You can just use `path.getFileSystem(hadoopConf)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200962816
  
test this please.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200701222
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54006/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200701133
  
**[Test build #54006 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54006/consoleFull)**
 for PR 11925 at commit 
[`dcf8096`](https://github.com/apache/spark/commit/dcf80967b6f85f30d34db5730ead611b83912f53).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200682795
  
**[Test build #54006 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54006/consoleFull)**
 for PR 11925 at commit 
[`dcf8096`](https://github.com/apache/spark/commit/dcf80967b6f85f30d34db5730ead611b83912f53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200612231
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53991/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200612221
  
**[Test build #53991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53991/consoleFull)**
 for PR 11925 at commit 
[`c490a25`](https://github.com/apache/spark/commit/c490a25a3f0606477575e9832c7100bc675081a3).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200612229
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200611865
  
**[Test build #53991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53991/consoleFull)**
 for PR 11925 at commit 
[`c490a25`](https://github.com/apache/spark/commit/c490a25a3f0606477575e9832c7100bc675081a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200610889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53990/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200610876
  
**[Test build #53990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53990/consoleFull)**
 for PR 11925 at commit 
[`f3dade7`](https://github.com/apache/spark/commit/f3dade7127a7ccde96c002abc6ad75dac5337d6d).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  trait FileManager `
  * `  class FileContextManager(path: Path, hadoopConf: Configuration) 
extends FileManager `
  * `  class FileSystemManager(path: Path, hadoopConf: Configuration) 
extends FileManager `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200610470
  
@marmbrus @zsxwing @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200610884
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200610164
  
**[Test build #53990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53990/consoleFull)**
 for PR 11925 at commit 
[`f3dade7`](https://github.com/apache/spark/commit/f3dade7127a7ccde96c002abc6ad75dac5337d6d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/11925

[SPARK-14109][SQL] Fix HDFSMetadataLog to fallback to FileSystem

## What changes were proposed in this pull request?

HDFSMetadataLog uses newer FileContext API to achieve atomic renaming. 
However, FileContext implementations may not exist for many scheme for which 
there may be FileSystem implementations. In those cases, rather than failing 
completely, we should fallback to the FileSystem based implementation, and log 
warning that there may be file consistency issues in case the log directory is 
concurrently modified.


## How was this patch tested?

Unit test. 
Cluster test pending.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-14109

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11925.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11925


commit f3dade7127a7ccde96c002abc6ad75dac5337d6d
Author: Tathagata Das 
Date:   2016-03-24T01:40:00Z

Fix HDFSMetadataLog to fallback to FileSystem




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org