BentsiLeviav commented on PR #36059:
URL: https://github.com/apache/beam/pull/36059#issuecomment-3280084883

   Hi folks
   
   First, @Suvrat1629 , huge thanks for the contribution.
   
   I needed to be more specific and detailed in #33692, my apologies.
   
   As of today, this is the existing behavior:
   * We only take care of `DEFAULT` type (we don't take care of default types 
that are of the form of `Materialization` and `Alias`)
   * Even for `Default`, we have a small set of supported types
   * **We don't use the defaults as of this moment** - the `RowBinary` format 
doesn't use them, and they are just being ignored. 
   * This functionality is not supposed to be handled on the client side (as it 
is impossible to materialize all ClickHouse expressions in Java), and 
ClickHouse created the `RowBinaryWithDefaults` format to move the defaults 
handling into the database itself. 
   
   
   ### Is there any other place we use this?
   
   Yes.
   When we implemented the BigQuery to ClickHouse Dataflow template, we 
received from BigQuery `TableRow` - a schemaless object. In order to convert it 
to schema schema-aware type, we fetched the schema from ClickHouse and [used 
this function to convert each string value to the proper Java 
representation](https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/44b3463faa012e7845e8d44ee8ca785aacf8122e/v2/googlecloud-to-clickhouse/src/main/java/com/google/cloud/teleport/v2/clickhouse/templates/BigQueryToClickHouse.java#L212).
 
   
   
   The original idea behind #33692 was not only to add these types, but also to 
solve/decouple the tension between the different needs (the first need is to 
evaluate defaults, the second is to evaluate primitive values). 
   
   
   As I commented here 
https://github.com/apache/beam/issues/33692#issuecomment-3150988953, we plan to 
upgrade the ClickHouse client, support the use of `RowBinaryWithDefaults`, and 
refactor the `parseDefaultExpression` to work with primitive types only. 
   
   
   ### Suggested implementation in this PR
   
   The main issue with the existing implementation (that is being extended with 
this PR) is that in case users use none primitive default values (which is 
usually the case), the entire pipe will collapse. 
   
   For instance, given the following target table:
   
   ```SQL
   CREATE OR REPLACE TABLE test
   (
       id UInt64,
       updated_at DateTime DEFAULT now(),
       updated_at_date Date DEFAULT toDate(updated_at)
   )
   ENGINE = MergeTree
   ORDER BY id;
   ```
   
   The `DESRIBE` command output will be:
   
   <img width="1171" height="350" alt="image" 
src="https://github.com/user-attachments/assets/ece786fd-b15a-4051-ac4b-8b5b6ba182ae";
 />
   
   
   and the entire pipe will get broken (the code will try to create a DateTime 
object of the string `now()`). 
   
   
   So to conclude, the right solution for this problem would be to implement 
support for the `RowBinaryWithDefaults` format and refactor+rename the 
`parseDefaultExpression` to support only primitive objects.
   
   CC @Abacn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to