I am trying to ingest data into iceberg table using spark streaming. There
are no multiple writers to same data at the moment. According to iceberg api
<https://iceberg.apache.org/javadoc/0.11.0/org/apache/iceberg/IsolationLevel.html#:%7E:text=Both%20of%20them%20provide%20a,environments%20with%20many%20concurrent%20writers.>
default
isolation level for table is serializable . I want to understand if there
is only a single application (single spark streaming job in my case)
writing to iceberg table is there any advantage or disadvantage over using
serializable or a snapshot isolation ? Is there any performance impact of
using serializable when only one application is writing to table? Also it
seems iceberg allows all writers to write into snapshot and use OCC to
decide if one needs to retry because it was late. In this case how it is
serializable at all? isn't serilizability achieved via
pessimistic concurrency control? Would like to understand how iceberg
implement serializable isolation level and how it is different than
snapshot isolation ?

Thanks

Reply via email to