[GitHub] [iceberg] SreeramGarlapati opened a new pull request #2660: Spark3 structured streaming micro_batch read support

GitBox Tue, 01 Jun 2021 23:39:52 -0700


SreeramGarlapati opened a new pull request #2660:
URL: https://github.com/apache/iceberg/pull/2660



   This work is an extension of the idea in issue 
https://github.com/apache/iceberg/issues/179 & the Spark2 work done in PR #2272 
- only that - this is for Spark3.
   
   **In the current implementation:**
   * Iceberg Snapshot is the upper bound for MicroBatch. A given MicroBatch 
will only Span within a Snapshot. It will not be composed of multiple 
Snapshots. BatchSize - is used to limit the number of files with in a given 
snapshot.
   * The streaming reader - will error out if it encounters any Snapshot of 
type NOT EQUAL to type `APPEND`. 
   * Handling `DELETES`, `REPLACE` & `OVERWRITES` is something for future.
   * Columnar reads are not enabled. Something for future.
   
   cc: @aokolnychyi & @RussellSpitzer & @holdenk @rdblue @rdsr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] SreeramGarlapati opened a new pull request #2660: Spark3 structured streaming micro_batch read support

Reply via email to