Re: [I] flink:FlinkSink support dynamically changed schema [iceberg]

via GitHub Wed, 17 Apr 2024 10:33:19 -0700


pvary commented on issue #4190:
URL: https://github.com/apache/iceberg/issues/4190#issuecomment-2061836288


   I think it is not trivial to implement this feature, as the schema of the 
RowData objects which are the input of the Sink is finalized when the job graph 
is created. To change the schema one need to regenerate the job graph, 
essentially restarting the job (calling the main method).
   There might be some way to work around this, by changing the input to 
records where the schema is embedded to the records (performance loss), or 
getting the schema from an outside source (additional external depenency), but 
this would need some deeper changes in the Sink.
   Also care should be taken, how to synchronize the table schema refresh 
throughout the tasks when the changes are detected...
   
   As a workaround, we created our own schema check before converting the input 
to RowData, and throw a `SuppressRestartsException` when changes are detected.
   We used Flink Kubernetes Operator to restart the job from failed state, 
using `kubernetes.operator.job.restart.failed`. The `main` method refreshes the 
table and the new job instance is started with the new schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] flink:FlinkSink support dynamically changed schema [iceberg]

Reply via email to