[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2022-01-09 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1008248624


   > @dongkelun LGTM, just left some minor comments. @xushiyan Further review 
whether this strategy makes sense.
   
   Except for two doubts, the others have been submitted for modification


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2022-01-04 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1005402855


   > @nsivabalan no this won't go to 0.10.1 as it introduces new config. 
@dongkelun as this won't be included in 0.10.1, can we hold this off until next 
week to land? just try to avoid potential conflicts.
   
   OK, if you're free, can you review it first? I'll submit the code that needs 
to be modified first, and then land next week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2022-01-04 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1005402820


   > @nsivabalan no this won't go to 0.10.1 as it introduces new config. 
@dongkelun as this won't be included in 0.10.1, can we hold this off until next 
week to land? just try to avoid potential conflicts.
   
   OK, if you're free, can you review it first? I'll submit the code that needs 
to be modified first, and then land next week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2022-01-02 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1003813774


   @YannByron @xushiyan Hello, I have modified and submitted the code according 
to the new solution. Can you have a look?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2022-01-02 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1003690600


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-31 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1003327348


   > > @dongkelun @xushiyan I offer another solution to discuss.
   > > Query incrementally in hive need to set 
`hoodie.%s.consume.start.timestamp` which is used in 
`HoodieHiveUtils.readStartCommitTime`。Currently, we pass the 
`hoodie.table.name` named `tableName` to this function. We can add configs 
`hoodie.datasource.write.database.name` in `DataSourceWriteOptions` and 
`hoodie.database.name` in `HoodieTableConfig`. And if `database.name` provided, 
we joint the `database.name` and `table.name` and pass it to 
`readStartCommitTime`. And then, use can set 
`hoodie.dbName.tableName.consume.start.timestamp` in hive and query.
   > > Also, `hoodie.datasource.write.database.name` and `hoodie.database.name` 
can reuse in other scene.
   > > @xushiyan what do you think.
   > 
   > @xushiyan @YannByron I probably understand the solution.
   > 
   > SQL will persist the database name to ` hoodie.properties` by default, DF 
is selectively persisted through optional database parameters. Then, in 
incremental query, if set ` databaseName.tableName`, we match 
`databaseName.tableName`. If it is inconsistent or there is no databaseName, 
incremental query will not be performed. If consistent, perform an incremental 
query.If the incremental query does not have a database name set, does not 
match the database name, only the table name
   > 
   > So, which parameter should DF use to persist the database name?
   
   @xushiyan Hello, do you think this idea is OK? If so, I'll submit a version 
according to this idea first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002442373


   > @dongkelun @xushiyan I offer another solution to discuss.
   > 
   > Query incrementally in hive need to set 
`hoodie.%s.consume.start.timestamp` which is used in 
`HoodieHiveUtils.readStartCommitTime`。Currently, we pass the 
`hoodie.table.name` named `tableName` to this function. We can add configs 
`hoodie.datasource.write.database.name` in `DataSourceWriteOptions` and 
`hoodie.database.name` in `HoodieTableConfig`. And if `database.name` provided, 
we joint the `database.name` and `table.name` and pass it to 
`readStartCommitTime`. And then, use can set 
`hoodie.dbName.tableName.consume.start.timestamp` in hive and query.
   > 
   > Also, `hoodie.datasource.write.database.name` and `hoodie.database.name` 
can reuse in other scene.
   > 
   > @xushiyan what do you think.
   
   @xushiyan @YannByron   I probably understand the solution. 
   
   SQL will persist the database name to ` hoodie.properties` by default, DF is 
selectively persisted through optional database parameters. Then, in 
incremental query, if  set ` databaseName.tableName`, we match 
`databaseName.tableName`. If it is inconsistent or there is no databaseName, 
incremental query will not be performed. If consistent, perform an incremental 
query.If the incremental query does not have a database name set, does not 
match the database name, only the table name
   
   So, which parameter should DF use to persist the database name?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002137086


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002113021


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1001969965


   > @dongkelun @xushiyan I'm sorry of not supporting this pr to solve the 
problems that `set the start query time of the table` and `query 
incrementally`. Some points we should think about:
   > 
   > 1. as this pr, just work for spark-sql. What about Spark DataFrame Write? 
We should support both.
   > 2. after adding `database` config, no matter get the `database` value by 
using a individual config like `hoodie.datasource.write.database.name` or 
parsing from the existing 
`hoodie.datasource.write.table.name`/`hoodie.table.name` when enable 
`hoodie.sql.uses.database.table.name`, we'll have four related options: 
`hoodie.datasource.hive_sync.table`, `hoodie.datasource.hive_sync.database` and 
the two mentioned above. Then, user have to learn these. Can we combine and 
simplify these?
   > 
   > IMO, Hudi with a mountain of configs already has a high threshold of use. 
We should choose some solutions which balance the functionality and use 
experience as far as possible.
   @YannByron  Hello
   1、About Spark DataFrame Write, we can use ` hoodie table. Name ` to specify 
the table name
   2、Because the database name can be specified when creating tables in Spark 
SQL, it is not through ` hoodie database. name ' and other configurations are 
specified. I think ` hoodie sql. use. database. table. name `  is just a switch 
to judge whether SQL needs to be given ` hoodie table. name`  specify the 
database name. It does not conflict with other configurations
   As for combine other duplicate configuration items, I think we can solve 
them in other separate PR
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-11-25 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-978926101


   @xushiyan @YannByron Hi,Can you please help review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org