[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...

2017-02-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17079


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17079#discussion_r103373620
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
 ---
@@ -178,6 +178,33 @@ class FileIndexSuite extends SharedSQLContext {
   assert(catalog2.allFiles().nonEmpty)
 }
   }
+
+  test("refresh for InMemoryFileIndex with FileStatusCache") {
+withTempDir { dir =>
+  val fileStatusCache = FileStatusCache.getOrCreate(spark)
+  val dirPath = new Path(dir.getAbsolutePath)
+  val fs = dirPath.getFileSystem(spark.sessionState.newHadoopConf())
+  val catalog =
+new InMemoryFileIndex(spark, Seq(dirPath), Map.empty, None, 
fileStatusCache) {
+def leafFilePaths: Seq[Path] = leafFiles.keys.toSeq
+def leafDirPaths: Seq[Path] = leafDirToChildrenFiles.keys.toSeq
+  }
+
+  assert(catalog.leafDirPaths.isEmpty)
+  assert(catalog.leafFilePaths.isEmpty)
--- End diff --

Move these two asserts after `stringToFile`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17079#discussion_r103373646
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
 ---
@@ -178,6 +178,33 @@ class FileIndexSuite extends SharedSQLContext {
   assert(catalog2.allFiles().nonEmpty)
 }
   }
+
+  test("refresh for InMemoryFileIndex with FileStatusCache") {
+withTempDir { dir =>
+  val fileStatusCache = FileStatusCache.getOrCreate(spark)
+  val dirPath = new Path(dir.getAbsolutePath)
+  val fs = dirPath.getFileSystem(spark.sessionState.newHadoopConf())
+  val catalog =
+new InMemoryFileIndex(spark, Seq(dirPath), Map.empty, None, 
fileStatusCache) {
+def leafFilePaths: Seq[Path] = leafFiles.keys.toSeq
+def leafDirPaths: Seq[Path] = leafDirToChildrenFiles.keys.toSeq
+  }
--- End diff --

Nit: Indents issues for the above three lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...

2017-02-27 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17079#discussion_r103357633
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
 ---
@@ -178,6 +178,34 @@ class FileIndexSuite extends SharedSQLContext {
   assert(catalog2.allFiles().nonEmpty)
 }
   }
+
+  test("refresh for InMemoryFileIndex with FileStatusCache") {
+withTempDir { dir =>
+  val fileStatusCache = FileStatusCache.getOrCreate(spark)
+  val dirPath = new Path(dir.getAbsolutePath)
+  val catalog = new InMemoryFileIndex(spark, Seq(dirPath), Map.empty,
+None, fileStatusCache) {
+def leafFilePaths: Seq[Path] = leafFiles.keys.toSeq
+def leafDirPaths: Seq[Path] = leafDirToChildrenFiles.keys.toSeq
+  }
+
+  assert(catalog.leafDirPaths.isEmpty)
+  assert(catalog.leafFilePaths.isEmpty)
+
+  val file = new File(dir, "text.txt")
+  stringToFile(file, "text")
+
+  catalog.refresh()
+
+  assert(catalog.leafFilePaths.size == 1)
+  assert(catalog.leafFilePaths.head.toString.stripSuffix("/") ==
+s"file:${file.getAbsolutePath.stripSuffix("/")}")
--- End diff --

ok, let me modify~ thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...

2017-02-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17079#discussion_r103350023
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
 ---
@@ -178,6 +178,34 @@ class FileIndexSuite extends SharedSQLContext {
   assert(catalog2.allFiles().nonEmpty)
 }
   }
+
+  test("refresh for InMemoryFileIndex with FileStatusCache") {
+withTempDir { dir =>
+  val fileStatusCache = FileStatusCache.getOrCreate(spark)
+  val dirPath = new Path(dir.getAbsolutePath)
+  val catalog = new InMemoryFileIndex(spark, Seq(dirPath), Map.empty,
+None, fileStatusCache) {
+def leafFilePaths: Seq[Path] = leafFiles.keys.toSeq
+def leafDirPaths: Seq[Path] = leafDirToChildrenFiles.keys.toSeq
+  }
+
+  assert(catalog.leafDirPaths.isEmpty)
+  assert(catalog.leafFilePaths.isEmpty)
+
+  val file = new File(dir, "text.txt")
+  stringToFile(file, "text")
+
+  catalog.refresh()
+
+  assert(catalog.leafFilePaths.size == 1)
+  assert(catalog.leafFilePaths.head.toString.stripSuffix("/") ==
+s"file:${file.getAbsolutePath.stripSuffix("/")}")
--- End diff --

this looks hacky, can you turn them into `Path` and compare?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...

2017-02-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17079#discussion_r103349639
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
 ---
@@ -178,6 +178,34 @@ class FileIndexSuite extends SharedSQLContext {
   assert(catalog2.allFiles().nonEmpty)
 }
   }
+
+  test("refresh for InMemoryFileIndex with FileStatusCache") {
+withTempDir { dir =>
+  val fileStatusCache = FileStatusCache.getOrCreate(spark)
+  val dirPath = new Path(dir.getAbsolutePath)
+  val catalog = new InMemoryFileIndex(spark, Seq(dirPath), Map.empty,
+None, fileStatusCache) {
--- End diff --

nit:
```
val catalog =
  new XXX(...) {
def xxx
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org