Hi Peter, On a similar topic, I created a PR to support custom schema in ResolvingFileIO (https://github.com/apache/iceberg/pull/9884). Maybe the FlinkIO can be a new schema/extension in the ResolvingFileIO.
If I agree that it would be interesting to have support for FlinkFileIO, I'm not sure it's a good idea to have it directly in the Iceberg. I think it would be great to leverage the extension mechanism we have in Iceberg (FileIO/ResolvingFileIO). Iceberg Core should not include engine specific dependency imho. However, having a "flink:" schema in ResolvingFileIO where we can leverage FlinkFileIO could be interesting. Just thinking out loud :) Regards JB On Fri, Apr 19, 2024 at 12:08 PM Péter Váry <peter.vary.apa...@gmail.com> wrote: > > Hi Iceberg Team, > > Flink has its own FileSystem implementation. See: > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/. > This FileSystem already has several implementations: > > Hadoop > Azure > S3 > Google Cloud Storage > ... > > As a general rule in Flink, one should use this FileSystem to consume and > persistently store data. > If these FileSystems are configured, then Flink makes sure that the > configurations are consistent and available for the JM/TM. > Also as an added benefit, delegation tokens are handled and distributed for > these FileSystems automatically. See: > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/security/security-delegation-token/ > > In house, some of our new users are struggling with parametrizing > HadooFileIO, and S3FileIO for Iceberg, trying to wrap their head around that > they have to provide different configurations for the checkpointing and for > the Iceberg table storage (even if they are stored in the same bucket, or on > the same HDFS cluster) > > I have created a PR, which provides a FileIO implementation which uses > FlinkFileSystem. Very imaginatively I have named it FlinkFileIO. See: > https://github.com/apache/iceberg/pull/10151 > > This would allow the users to configure the FileSystem only once, and use > this FileSystem to access Iceberg tables. Also, if for whatever reason the > global nature of flink file system config is limiting, the users still could > revert back using the other FileIO implementations. > > What do you think? Would this be a useful addition to the Iceberg-Flink > integration? > > Thanks, > Peter