Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


codope merged PR #10414:
URL: https://github.com/apache/hudi/pull/10414


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874882886

   
   ## CI report:
   
   * c64e1e3a9816b278606ee32aede728ffb928708c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874800023

   
   ## CI report:
   
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781)
 
   * c64e1e3a9816b278606ee32aede728ffb928708c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874796616

   
   ## CI report:
   
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781)
 
   * c64e1e3a9816b278606ee32aede728ffb928708c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


bhat-vinay commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874797572

   Thanks for the review @bvaradar. @codope pointed that the failing tests 
could be fixed by https://github.com/apache/hudi/pull/10381. Rebased past it to 
see if I can get a clean run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873940685

   
   ## CI report:
   
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873933395

   
   ## CI report:
   
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-01 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873670181

   
   ## CI report:
   
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-01 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873620067

   
   ## CI report:
   
   * 898fd87176b5546dbceb7062998ff517b2ec347e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21760)
 
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-01 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873606885

   
   ## CI report:
   
   * 898fd87176b5546dbceb7062998ff517b2ec347e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21760)
 
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-29 Thread via GitHub


bvaradar commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1872341655

   @bhat-vinay : Landed the other PR. Please resolve conflicts and rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-29 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1871882269

   
   ## CI report:
   
   * 898fd87176b5546dbceb7062998ff517b2ec347e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21760)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-28 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1871793208

   
   ## CI report:
   
   * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21708)
 
   * 898fd87176b5546dbceb7062998ff517b2ec347e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21760)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-28 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1871789951

   
   ## CI report:
   
   * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21708)
 
   * 898fd87176b5546dbceb7062998ff517b2ec347e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-27 Thread via GitHub


bvaradar commented on code in PR #10414:
URL: https://github.com/apache/hudi/pull/10414#discussion_r1437380985


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestHoodieTableValuedFunction.scala:
##
@@ -192,6 +192,69 @@ class TestHoodieTableValuedFunction extends 
HoodieSparkSqlTestBase {
 }
   }
 
+  test(s"Test hudi_filesystem_view") {
+if (HoodieSparkUtils.gteqSpark3_2) {
+  withTempDir { tmp =>
+Seq(
+  ("cow", true),
+  ("mor", true),
+  ("cow", false),
+  ("mor", false)
+).foreach { parameters =>
+  val tableType = parameters._1
+  val isTableId = parameters._2
+
+  val tableName = generateTableName
+  val tablePath = s"${tmp.getCanonicalPath}/$tableName"
+  val identifier = if (isTableId) tableName else tablePath
+  spark.sql("set hoodie.sql.insert.mode = non-strict")
+
+  spark.sql(
+s"""
+   |create table $tableName (
+   |  id int,
+   |  name string,
+   |  price double
+   |) using hudi
+   |partitioned by (price)
+   |tblproperties (
+   |  type = '$tableType',
+   |  primaryKey = 'id'
+   |)
+   |location '$tablePath'
+   |""".stripMargin
+  )
+
+  spark.sql(
+s"""
+   | insert into $tableName
+   | values (1, 'a1', 10.0), (2, 'a2', 20.0), (3, 'a3', 30.0)
+   | """.stripMargin
+  )
+  spark.sql(
+s"""
+   | insert into $tableName
+   | values (4, 'a4', 10.0), (5, 'a5', 20.0), (6, 'a6', 30.0)
+   | """.stripMargin
+  )
+  val result1DF = spark.sql(s"select * from 
hudi_filesystem_view('$identifier', 'price*')")
+  result1DF.show(false)
+  val result1Array = result1DF.select(
+  col("Partition_Path")
+).orderBy("Partition_Path").take(10)
+  checkAnswer(result1Array)(
+Seq("price=10.0"),

Review Comment:
   @bhat-vinay : Sounds good. The PR says it is dependent on 
https://github.com/apache/hudi/pull/10355 . So, waiting for that PR to finish 
before landing this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-27 Thread via GitHub


bhat-vinay commented on code in PR #10414:
URL: https://github.com/apache/hudi/pull/10414#discussion_r1437361618


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestHoodieTableValuedFunction.scala:
##
@@ -192,6 +192,69 @@ class TestHoodieTableValuedFunction extends 
HoodieSparkSqlTestBase {
 }
   }
 
+  test(s"Test hudi_filesystem_view") {
+if (HoodieSparkUtils.gteqSpark3_2) {
+  withTempDir { tmp =>
+Seq(
+  ("cow", true),
+  ("mor", true),
+  ("cow", false),
+  ("mor", false)
+).foreach { parameters =>
+  val tableType = parameters._1
+  val isTableId = parameters._2
+
+  val tableName = generateTableName
+  val tablePath = s"${tmp.getCanonicalPath}/$tableName"
+  val identifier = if (isTableId) tableName else tablePath
+  spark.sql("set hoodie.sql.insert.mode = non-strict")
+
+  spark.sql(
+s"""
+   |create table $tableName (
+   |  id int,
+   |  name string,
+   |  price double
+   |) using hudi
+   |partitioned by (price)
+   |tblproperties (
+   |  type = '$tableType',
+   |  primaryKey = 'id'
+   |)
+   |location '$tablePath'
+   |""".stripMargin
+  )
+
+  spark.sql(
+s"""
+   | insert into $tableName
+   | values (1, 'a1', 10.0), (2, 'a2', 20.0), (3, 'a3', 30.0)
+   | """.stripMargin
+  )
+  spark.sql(
+s"""
+   | insert into $tableName
+   | values (4, 'a4', 10.0), (5, 'a5', 20.0), (6, 'a6', 30.0)
+   | """.stripMargin
+  )
+  val result1DF = spark.sql(s"select * from 
hudi_filesystem_view('$identifier', 'price*')")
+  result1DF.show(false)
+  val result1Array = result1DF.select(
+  col("Partition_Path")
+).orderBy("Partition_Path").take(10)
+  checkAnswer(result1Array)(
+Seq("price=10.0"),

Review Comment:
   The FileSystemView also shows the partition path. The test table is 
partitioned on 'price' column and hence the partition directories (under base 
path) are named as `price=10.0` etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-27 Thread via GitHub


bvaradar commented on code in PR #10414:
URL: https://github.com/apache/hudi/pull/10414#discussion_r1437277470


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestHoodieTableValuedFunction.scala:
##
@@ -192,6 +192,69 @@ class TestHoodieTableValuedFunction extends 
HoodieSparkSqlTestBase {
 }
   }
 
+  test(s"Test hudi_filesystem_view") {
+if (HoodieSparkUtils.gteqSpark3_2) {
+  withTempDir { tmp =>
+Seq(
+  ("cow", true),
+  ("mor", true),
+  ("cow", false),
+  ("mor", false)
+).foreach { parameters =>
+  val tableType = parameters._1
+  val isTableId = parameters._2
+
+  val tableName = generateTableName
+  val tablePath = s"${tmp.getCanonicalPath}/$tableName"
+  val identifier = if (isTableId) tableName else tablePath
+  spark.sql("set hoodie.sql.insert.mode = non-strict")
+
+  spark.sql(
+s"""
+   |create table $tableName (
+   |  id int,
+   |  name string,
+   |  price double
+   |) using hudi
+   |partitioned by (price)
+   |tblproperties (
+   |  type = '$tableType',
+   |  primaryKey = 'id'
+   |)
+   |location '$tablePath'
+   |""".stripMargin
+  )
+
+  spark.sql(
+s"""
+   | insert into $tableName
+   | values (1, 'a1', 10.0), (2, 'a2', 20.0), (3, 'a3', 30.0)
+   | """.stripMargin
+  )
+  spark.sql(
+s"""
+   | insert into $tableName
+   | values (4, 'a4', 10.0), (5, 'a5', 20.0), (6, 'a6', 30.0)
+   | """.stripMargin
+  )
+  val result1DF = spark.sql(s"select * from 
hudi_filesystem_view('$identifier', 'price*')")
+  result1DF.show(false)
+  val result1Array = result1DF.select(
+  col("Partition_Path")
+).orderBy("Partition_Path").take(10)
+  checkAnswer(result1Array)(
+Seq("price=10.0"),

Review Comment:
   @bhat-vinay : One basic question. Not sure I am following this test case. 
With FileSystemView, TVF, we are expecting only columns like fileId, filesize, 
... right. Why are we expecting the column price to be returned. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-27 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1870171321

   
   ## CI report:
   
   * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21708)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-26 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1870002783

   
   ## CI report:
   
   * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21708)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2023-12-26 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1869998596

   
   ## CI report:
   
   * 02890a9f3e2ae80f64beb2cc41bb2a8a98c1a9e3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org