BentsiLeviav commented on PR #36059: URL: https://github.com/apache/beam/pull/36059#issuecomment-3280084883
Hi folks First, @Suvrat1629 , huge thanks for the contribution. I needed to be more specific and detailed in #33692, my apologies. As of today, this is the existing behavior: * We only take care of `DEFAULT` type (we don't take care of default types that are of the form of `Materialization` and `Alias`) * Even for `Default`, we have a small set of supported types * **We don't use the defaults as of this moment** - the `RowBinary` format doesn't use them, and they are just being ignored. * This functionality is not supposed to be handled on the client side (as it is impossible to materialize all ClickHouse expressions in Java), and ClickHouse created the `RowBinaryWithDefaults` format to move the defaults handling into the database itself. ### Is there any other place we use this? Yes. When we implemented the BigQuery to ClickHouse Dataflow template, we received from BigQuery `TableRow` - a schemaless object. In order to convert it to schema schema-aware type, we fetched the schema from ClickHouse and [used this function to convert each string value to the proper Java representation](https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/44b3463faa012e7845e8d44ee8ca785aacf8122e/v2/googlecloud-to-clickhouse/src/main/java/com/google/cloud/teleport/v2/clickhouse/templates/BigQueryToClickHouse.java#L212). The original idea behind #33692 was not only to add these types, but also to solve/decouple the tension between the different needs (the first need is to evaluate defaults, the second is to evaluate primitive values). As I commented here https://github.com/apache/beam/issues/33692#issuecomment-3150988953, we plan to upgrade the ClickHouse client, support the use of `RowBinaryWithDefaults`, and refactor the `parseDefaultExpression` to work with primitive types only. ### Suggested implementation in this PR The main issue with the existing implementation (that is being extended with this PR) is that in case users use none primitive default values (which is usually the case), the entire pipe will collapse. For instance, given the following target table: ```SQL CREATE OR REPLACE TABLE test ( id UInt64, updated_at DateTime DEFAULT now(), updated_at_date Date DEFAULT toDate(updated_at) ) ENGINE = MergeTree ORDER BY id; ``` The `DESRIBE` command output will be: <img width="1171" height="350" alt="image" src="https://github.com/user-attachments/assets/ece786fd-b15a-4051-ac4b-8b5b6ba182ae" /> and the entire pipe will get broken (the code will try to create a DateTime object of the string `now()`). So to conclude, the right solution for this problem would be to implement support for the `RowBinaryWithDefaults` format and refactor+rename the `parseDefaultExpression` to support only primitive objects. CC @Abacn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
