[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17079 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17079#discussion_r103373620 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala --- @@ -178,6 +178,33 @@ class FileIndexSuite extends SharedSQLContext { assert(catalog2.allFiles().nonEmpty) } } + + test("refresh for InMemoryFileIndex with FileStatusCache") { +withTempDir { dir => + val fileStatusCache = FileStatusCache.getOrCreate(spark) + val dirPath = new Path(dir.getAbsolutePath) + val fs = dirPath.getFileSystem(spark.sessionState.newHadoopConf()) + val catalog = +new InMemoryFileIndex(spark, Seq(dirPath), Map.empty, None, fileStatusCache) { +def leafFilePaths: Seq[Path] = leafFiles.keys.toSeq +def leafDirPaths: Seq[Path] = leafDirToChildrenFiles.keys.toSeq + } + + assert(catalog.leafDirPaths.isEmpty) + assert(catalog.leafFilePaths.isEmpty) --- End diff -- Move these two asserts after `stringToFile` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17079#discussion_r103373646 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala --- @@ -178,6 +178,33 @@ class FileIndexSuite extends SharedSQLContext { assert(catalog2.allFiles().nonEmpty) } } + + test("refresh for InMemoryFileIndex with FileStatusCache") { +withTempDir { dir => + val fileStatusCache = FileStatusCache.getOrCreate(spark) + val dirPath = new Path(dir.getAbsolutePath) + val fs = dirPath.getFileSystem(spark.sessionState.newHadoopConf()) + val catalog = +new InMemoryFileIndex(spark, Seq(dirPath), Map.empty, None, fileStatusCache) { +def leafFilePaths: Seq[Path] = leafFiles.keys.toSeq +def leafDirPaths: Seq[Path] = leafDirToChildrenFiles.keys.toSeq + } --- End diff -- Nit: Indents issues for the above three lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17079#discussion_r103357633 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala --- @@ -178,6 +178,34 @@ class FileIndexSuite extends SharedSQLContext { assert(catalog2.allFiles().nonEmpty) } } + + test("refresh for InMemoryFileIndex with FileStatusCache") { +withTempDir { dir => + val fileStatusCache = FileStatusCache.getOrCreate(spark) + val dirPath = new Path(dir.getAbsolutePath) + val catalog = new InMemoryFileIndex(spark, Seq(dirPath), Map.empty, +None, fileStatusCache) { +def leafFilePaths: Seq[Path] = leafFiles.keys.toSeq +def leafDirPaths: Seq[Path] = leafDirToChildrenFiles.keys.toSeq + } + + assert(catalog.leafDirPaths.isEmpty) + assert(catalog.leafFilePaths.isEmpty) + + val file = new File(dir, "text.txt") + stringToFile(file, "text") + + catalog.refresh() + + assert(catalog.leafFilePaths.size == 1) + assert(catalog.leafFilePaths.head.toString.stripSuffix("/") == +s"file:${file.getAbsolutePath.stripSuffix("/")}") --- End diff -- ok, let me modify~ thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17079#discussion_r103350023 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala --- @@ -178,6 +178,34 @@ class FileIndexSuite extends SharedSQLContext { assert(catalog2.allFiles().nonEmpty) } } + + test("refresh for InMemoryFileIndex with FileStatusCache") { +withTempDir { dir => + val fileStatusCache = FileStatusCache.getOrCreate(spark) + val dirPath = new Path(dir.getAbsolutePath) + val catalog = new InMemoryFileIndex(spark, Seq(dirPath), Map.empty, +None, fileStatusCache) { +def leafFilePaths: Seq[Path] = leafFiles.keys.toSeq +def leafDirPaths: Seq[Path] = leafDirToChildrenFiles.keys.toSeq + } + + assert(catalog.leafDirPaths.isEmpty) + assert(catalog.leafFilePaths.isEmpty) + + val file = new File(dir, "text.txt") + stringToFile(file, "text") + + catalog.refresh() + + assert(catalog.leafFilePaths.size == 1) + assert(catalog.leafFilePaths.head.toString.stripSuffix("/") == +s"file:${file.getAbsolutePath.stripSuffix("/")}") --- End diff -- this looks hacky, can you turn them into `Path` and compare? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17079: [SPARK-19748][SQL]refresh function has a wrong or...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17079#discussion_r103349639 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala --- @@ -178,6 +178,34 @@ class FileIndexSuite extends SharedSQLContext { assert(catalog2.allFiles().nonEmpty) } } + + test("refresh for InMemoryFileIndex with FileStatusCache") { +withTempDir { dir => + val fileStatusCache = FileStatusCache.getOrCreate(spark) + val dirPath = new Path(dir.getAbsolutePath) + val catalog = new InMemoryFileIndex(spark, Seq(dirPath), Map.empty, +None, fileStatusCache) { --- End diff -- nit: ``` val catalog = new XXX(...) { def xxx } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org