[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1008248624 > @dongkelun LGTM, just left some minor comments. @xushiyan Further review whether this strategy makes sense. Except for two doubts, the others have been submitted for modification -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1005402855 > @nsivabalan no this won't go to 0.10.1 as it introduces new config. @dongkelun as this won't be included in 0.10.1, can we hold this off until next week to land? just try to avoid potential conflicts. OK, if you're free, can you review it first? I'll submit the code that needs to be modified first, and then land next week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1005402820 > @nsivabalan no this won't go to 0.10.1 as it introduces new config. @dongkelun as this won't be included in 0.10.1, can we hold this off until next week to land? just try to avoid potential conflicts. OK, if you're free, can you review it first? I'll submit the code that needs to be modified first, and then land next week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1003813774 @YannByron @xushiyan Hello, I have modified and submitted the code according to the new solution. Can you have a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1003690600 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1003327348 > > @dongkelun @xushiyan I offer another solution to discuss. > > Query incrementally in hive need to set `hoodie.%s.consume.start.timestamp` which is used in `HoodieHiveUtils.readStartCommitTime`。Currently, we pass the `hoodie.table.name` named `tableName` to this function. We can add configs `hoodie.datasource.write.database.name` in `DataSourceWriteOptions` and `hoodie.database.name` in `HoodieTableConfig`. And if `database.name` provided, we joint the `database.name` and `table.name` and pass it to `readStartCommitTime`. And then, use can set `hoodie.dbName.tableName.consume.start.timestamp` in hive and query. > > Also, `hoodie.datasource.write.database.name` and `hoodie.database.name` can reuse in other scene. > > @xushiyan what do you think. > > @xushiyan @YannByron I probably understand the solution. > > SQL will persist the database name to ` hoodie.properties` by default, DF is selectively persisted through optional database parameters. Then, in incremental query, if set ` databaseName.tableName`, we match `databaseName.tableName`. If it is inconsistent or there is no databaseName, incremental query will not be performed. If consistent, perform an incremental query.If the incremental query does not have a database name set, does not match the database name, only the table name > > So, which parameter should DF use to persist the database name? @xushiyan Hello, do you think this idea is OK? If so, I'll submit a version according to this idea first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002442373 > @dongkelun @xushiyan I offer another solution to discuss. > > Query incrementally in hive need to set `hoodie.%s.consume.start.timestamp` which is used in `HoodieHiveUtils.readStartCommitTime`。Currently, we pass the `hoodie.table.name` named `tableName` to this function. We can add configs `hoodie.datasource.write.database.name` in `DataSourceWriteOptions` and `hoodie.database.name` in `HoodieTableConfig`. And if `database.name` provided, we joint the `database.name` and `table.name` and pass it to `readStartCommitTime`. And then, use can set `hoodie.dbName.tableName.consume.start.timestamp` in hive and query. > > Also, `hoodie.datasource.write.database.name` and `hoodie.database.name` can reuse in other scene. > > @xushiyan what do you think. @xushiyan @YannByron I probably understand the solution. SQL will persist the database name to ` hoodie.properties` by default, DF is selectively persisted through optional database parameters. Then, in incremental query, if set ` databaseName.tableName`, we match `databaseName.tableName`. If it is inconsistent or there is no databaseName, incremental query will not be performed. If consistent, perform an incremental query.If the incremental query does not have a database name set, does not match the database name, only the table name So, which parameter should DF use to persist the database name? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002137086 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002113021 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-1001969965 > @dongkelun @xushiyan I'm sorry of not supporting this pr to solve the problems that `set the start query time of the table` and `query incrementally`. Some points we should think about: > > 1. as this pr, just work for spark-sql. What about Spark DataFrame Write? We should support both. > 2. after adding `database` config, no matter get the `database` value by using a individual config like `hoodie.datasource.write.database.name` or parsing from the existing `hoodie.datasource.write.table.name`/`hoodie.table.name` when enable `hoodie.sql.uses.database.table.name`, we'll have four related options: `hoodie.datasource.hive_sync.table`, `hoodie.datasource.hive_sync.database` and the two mentioned above. Then, user have to learn these. Can we combine and simplify these? > > IMO, Hudi with a mountain of configs already has a high threshold of use. We should choose some solutions which balance the functionality and use experience as far as possible. @YannByron Hello 1、About Spark DataFrame Write, we can use ` hoodie table. Name ` to specify the table name 2、Because the database name can be specified when creating tables in Spark SQL, it is not through ` hoodie database. name ' and other configurations are specified. I think ` hoodie sql. use. database. table. name ` is just a switch to judge whether SQL needs to be given ` hoodie table. name` specify the database name. It does not conflict with other configurations As for combine other duplicate configuration items, I think we can solve them in other separate PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL
dongkelun commented on pull request #4083: URL: https://github.com/apache/hudi/pull/4083#issuecomment-978926101 @xushiyan @YannByron Hi,Can you please help review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org