Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
codope merged PR #10414: URL: https://github.com/apache/hudi/pull/10414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874882886 ## CI report: * c64e1e3a9816b278606ee32aede728ffb928708c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21800) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874800023 ## CI report: * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781) * c64e1e3a9816b278606ee32aede728ffb928708c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21800) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874796616 ## CI report: * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781) * c64e1e3a9816b278606ee32aede728ffb928708c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
bhat-vinay commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874797572 Thanks for the review @bvaradar. @codope pointed that the failing tests could be fixed by https://github.com/apache/hudi/pull/10381. Rebased past it to see if I can get a clean run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873940685 ## CI report: * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873933395 ## CI report: * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873670181 ## CI report: * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873620067 ## CI report: * 898fd87176b5546dbceb7062998ff517b2ec347e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21760) * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873606885 ## CI report: * 898fd87176b5546dbceb7062998ff517b2ec347e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21760) * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
bvaradar commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1872341655 @bhat-vinay : Landed the other PR. Please resolve conflicts and rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1871882269 ## CI report: * 898fd87176b5546dbceb7062998ff517b2ec347e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21760) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1871793208 ## CI report: * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21708) * 898fd87176b5546dbceb7062998ff517b2ec347e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21760) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1871789951 ## CI report: * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21708) * 898fd87176b5546dbceb7062998ff517b2ec347e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
bvaradar commented on code in PR #10414: URL: https://github.com/apache/hudi/pull/10414#discussion_r1437380985 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestHoodieTableValuedFunction.scala: ## @@ -192,6 +192,69 @@ class TestHoodieTableValuedFunction extends HoodieSparkSqlTestBase { } } + test(s"Test hudi_filesystem_view") { +if (HoodieSparkUtils.gteqSpark3_2) { + withTempDir { tmp => +Seq( + ("cow", true), + ("mor", true), + ("cow", false), + ("mor", false) +).foreach { parameters => + val tableType = parameters._1 + val isTableId = parameters._2 + + val tableName = generateTableName + val tablePath = s"${tmp.getCanonicalPath}/$tableName" + val identifier = if (isTableId) tableName else tablePath + spark.sql("set hoodie.sql.insert.mode = non-strict") + + spark.sql( +s""" + |create table $tableName ( + | id int, + | name string, + | price double + |) using hudi + |partitioned by (price) + |tblproperties ( + | type = '$tableType', + | primaryKey = 'id' + |) + |location '$tablePath' + |""".stripMargin + ) + + spark.sql( +s""" + | insert into $tableName + | values (1, 'a1', 10.0), (2, 'a2', 20.0), (3, 'a3', 30.0) + | """.stripMargin + ) + spark.sql( +s""" + | insert into $tableName + | values (4, 'a4', 10.0), (5, 'a5', 20.0), (6, 'a6', 30.0) + | """.stripMargin + ) + val result1DF = spark.sql(s"select * from hudi_filesystem_view('$identifier', 'price*')") + result1DF.show(false) + val result1Array = result1DF.select( + col("Partition_Path") +).orderBy("Partition_Path").take(10) + checkAnswer(result1Array)( +Seq("price=10.0"), Review Comment: @bhat-vinay : Sounds good. The PR says it is dependent on https://github.com/apache/hudi/pull/10355 . So, waiting for that PR to finish before landing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
bhat-vinay commented on code in PR #10414: URL: https://github.com/apache/hudi/pull/10414#discussion_r1437361618 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestHoodieTableValuedFunction.scala: ## @@ -192,6 +192,69 @@ class TestHoodieTableValuedFunction extends HoodieSparkSqlTestBase { } } + test(s"Test hudi_filesystem_view") { +if (HoodieSparkUtils.gteqSpark3_2) { + withTempDir { tmp => +Seq( + ("cow", true), + ("mor", true), + ("cow", false), + ("mor", false) +).foreach { parameters => + val tableType = parameters._1 + val isTableId = parameters._2 + + val tableName = generateTableName + val tablePath = s"${tmp.getCanonicalPath}/$tableName" + val identifier = if (isTableId) tableName else tablePath + spark.sql("set hoodie.sql.insert.mode = non-strict") + + spark.sql( +s""" + |create table $tableName ( + | id int, + | name string, + | price double + |) using hudi + |partitioned by (price) + |tblproperties ( + | type = '$tableType', + | primaryKey = 'id' + |) + |location '$tablePath' + |""".stripMargin + ) + + spark.sql( +s""" + | insert into $tableName + | values (1, 'a1', 10.0), (2, 'a2', 20.0), (3, 'a3', 30.0) + | """.stripMargin + ) + spark.sql( +s""" + | insert into $tableName + | values (4, 'a4', 10.0), (5, 'a5', 20.0), (6, 'a6', 30.0) + | """.stripMargin + ) + val result1DF = spark.sql(s"select * from hudi_filesystem_view('$identifier', 'price*')") + result1DF.show(false) + val result1Array = result1DF.select( + col("Partition_Path") +).orderBy("Partition_Path").take(10) + checkAnswer(result1Array)( +Seq("price=10.0"), Review Comment: The FileSystemView also shows the partition path. The test table is partitioned on 'price' column and hence the partition directories (under base path) are named as `price=10.0` etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
bvaradar commented on code in PR #10414: URL: https://github.com/apache/hudi/pull/10414#discussion_r1437277470 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestHoodieTableValuedFunction.scala: ## @@ -192,6 +192,69 @@ class TestHoodieTableValuedFunction extends HoodieSparkSqlTestBase { } } + test(s"Test hudi_filesystem_view") { +if (HoodieSparkUtils.gteqSpark3_2) { + withTempDir { tmp => +Seq( + ("cow", true), + ("mor", true), + ("cow", false), + ("mor", false) +).foreach { parameters => + val tableType = parameters._1 + val isTableId = parameters._2 + + val tableName = generateTableName + val tablePath = s"${tmp.getCanonicalPath}/$tableName" + val identifier = if (isTableId) tableName else tablePath + spark.sql("set hoodie.sql.insert.mode = non-strict") + + spark.sql( +s""" + |create table $tableName ( + | id int, + | name string, + | price double + |) using hudi + |partitioned by (price) + |tblproperties ( + | type = '$tableType', + | primaryKey = 'id' + |) + |location '$tablePath' + |""".stripMargin + ) + + spark.sql( +s""" + | insert into $tableName + | values (1, 'a1', 10.0), (2, 'a2', 20.0), (3, 'a3', 30.0) + | """.stripMargin + ) + spark.sql( +s""" + | insert into $tableName + | values (4, 'a4', 10.0), (5, 'a5', 20.0), (6, 'a6', 30.0) + | """.stripMargin + ) + val result1DF = spark.sql(s"select * from hudi_filesystem_view('$identifier', 'price*')") + result1DF.show(false) + val result1Array = result1DF.select( + col("Partition_Path") +).orderBy("Partition_Path").take(10) + checkAnswer(result1Array)( +Seq("price=10.0"), Review Comment: @bhat-vinay : One basic question. Not sure I am following this test case. With FileSystemView, TVF, we are expecting only columns like fileId, filesize, ... right. Why are we expecting the column price to be returned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1870171321 ## CI report: * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21708) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1870002783 ## CI report: * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21708) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]
hudi-bot commented on PR #10414: URL: https://github.com/apache/hudi/pull/10414#issuecomment-1869998596 ## CI report: * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org