Re:Re: Re: Re: Re: Refactor the code of HadoopTableOptions

lisoda Sun, 14 Jul 2024 03:35:49 -0700

Regarding HadoopTableOptions, if the filesystem supports rename operations that 
do not overwrite the target file, the entire HadoopTableOptions does not need 
to use lockManager. One of the reasons for keeping LockManager is simply 
because it was used in the code for the original implementation. If desired, we 
can remove the use of LockManager and avoid using it directly in the object 
store.Then it is possible to implement atomic operations without relying on 
LockManager.












At 2024-07-13 04:44:31, "Ryan Blue" <[email protected]> wrote:

FileIO purposely does not support a rename operation because we wanted to keep 
a minimal API that handled object stores correctly rather than using a 
FileSystem concept. While we may need some extensions outside of what the core 
provides for reading and writing tables, I think we still need to be careful 
here.


We have also been discouraging the use of HadoopTableOperations for several 
years now. Maybe updating it to use locks and moving it to a separate module is 
a good compromise, but my strong preference is for removing it.


On Thu, Jul 11, 2024 at 11:08 PM lisoda <[email protected]> wrote:

Hi,Sir.
I've finished extending the usual distributed locks.I think we'll no need to 
extend distributed locks for a long time.


PR:https://github.com/apache/iceberg/pull/10688


As a next step, I'm going to try to extend FileIO to support operations like 
rename. It would be great if you could give me your opinion on this.
Also, please let me know if there is anything I can do to support the creation 
of views.


Tks.
Regards
lisoda











At 2024-07-05 16:09:54, "Jean-Baptiste Onofré" <[email protected]> wrote:
>Hi,
>
>Actually the JDBC catalog relies on the RDBMS backend of the lock.
>That's one of the reasons why we are using a single RDBMS table for
>both tables and views. So, I don't think we would need a lock
>mechanism for JDBC, the RDBMS one is OK for now.
>About FileIO, we can always extend it, but as it's used in different
>Iceberg layers (like ResolvedFileIO for instance), we have to be
>careful adding new operations here, especially if it's specific for
>HadoopCatalog table/view operations. I will take a look.
>
>Thanks !
>Regards
>JB
>
>On Thu, Jul 4, 2024 at 4:49 PM lisoda <[email protected]> wrote:
>>
>> yea.If I'm not mistaken, the jdbc catalog has the same problem with 
>> concurrent commits.It doesn't have any locks to control concurrency.In other 
>> words, LockManager can be used for jdbcCatalog as well.
>>
>> Also, for the part about unbundling hadoop.I have a suggestion. Can we 
>> extend the FileIO interface so that all operations are implemented using 
>> FileIO?
>>
>>
>>
>>
>>
>>
>> 在 2024-07-04 23:38:30，"Jean-Baptiste Onofré" <[email protected]> 写道：
>> >Yeah, I agree with the distributed locking service. Maybe we can
>> >imagine a pluggable (by configuration) lock service depending of the
>> >user infra.
>> >
>> >For the view support, I can take a look (as I worked on the JDBC
>> >catalog view support).
>> >
>> >Anyway, I'm gonna take a look at your PR. Thanks again for your 
>> >contribution !
>> >
>> >Regards
>> >JB
>> >
>> >On Thu, Jul 4, 2024 at 4:05 PM lisoda <[email protected]> wrote:
>> >>
>> >> Hello.
>> >> Yea. Improving the commit mechanism is just the beginning.We also need to 
>> >> implement a distributed locking service for users who use object stores.I 
>> >> think the next step is to support iceberg-view and such.
>> >> But I've never used iceberg's views before.It will take me some time to 
>> >> familiarise myself with the functionality of the view section, if I'm to 
>> >> be of any assistance. But if you need my help, I'll do anything what I 
>> >> can.
>> >> Anyway, I'm glad to hear from you.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> 在 2024-07-04 22:04:17，"Jean-Baptiste Onofré" <[email protected]> 写道：
>> >> >Hi,
>> >> >
>> >> >Thanks for the heads up and working on this !
>> >> >
>> >> >My understanding of the HadoopCatalog is that we would need more than
>> >> >an improved commit mechanism to be production ready (I'm thinking on
>> >> >scalability, or view support). What's your thoughts?
>> >> >By the way, I'm happy to take a look at adding view support if it helps.
>> >> >
>> >> >Regards
>> >> >JB
>> >> >
>> >> >On Thu, Jul 4, 2024 at 8:27 AM lisoda <[email protected]> wrote:
>> >> >>
>> >> >> Hi Team.
>> >> >> I've refactored the logic of the commit method in 
>> >> >> HadoopTableOptions.With this refactoring, I believe that hadoopCatalog 
>> >> >> is ready to be used in a production environment. Now 
>> >> >> HadoopTableOptions can implement atomic commits while being compatible 
>> >> >> with the differences in behaviour between block and object 
>> >> >> stores.Concurrency control is also supported.if anyone can assist me 
>> >> >> in reiewing this PR, that would be great.
>> >> >> Also, any FileSystemCatalog's user can comment on this PR. Any advice 
>> >> >> would be invaluable to me.
>> >> >> Thank you all.
>> >> >>
>> >> >> PR:https://github.com/apache/iceberg/pull/10623
>> >> >> SLACK:https://apache-iceberg.slack.com/archives/C03LG1D563F/p1719993403208859




--

Ryan Blue
Databricks

Re:Re: Re: Re: Re: Refactor the code of HadoopTableOptions

Reply via email to