pvary commented on issue #4190: URL: https://github.com/apache/iceberg/issues/4190#issuecomment-2061836288
I think it is not trivial to implement this feature, as the schema of the RowData objects which are the input of the Sink is finalized when the job graph is created. To change the schema one need to regenerate the job graph, essentially restarting the job (calling the main method). There might be some way to work around this, by changing the input to records where the schema is embedded to the records (performance loss), or getting the schema from an outside source (additional external depenency), but this would need some deeper changes in the Sink. Also care should be taken, how to synchronize the table schema refresh throughout the tasks when the changes are detected... As a workaround, we created our own schema check before converting the input to RowData, and throw a `SuppressRestartsException` when changes are detected. We used Flink Kubernetes Operator to restart the job from failed state, using `kubernetes.operator.job.restart.failed`. The `main` method refreshes the table and the new job instance is started with the new schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org