?????? [DISCUSS] FLIP-191: Extend unified Sink interface to support small file compaction

GALENO Thu, 16 Dec 2021 17:28:34 -0800

????????




------------------&nbsp;????????&nbsp;------------------
??????:                                                                         
                                               "dev"                            
                                                        <[email protected]&gt;;
????????:&nbsp;2021??12??13??(??????) ????11:59
??????:&nbsp;"dev"<[email protected]&gt;;

????:&nbsp;Re: [DISCUSS] FLIP-191: Extend unified Sink interface to support 
small file compaction



Hi all,

After a lot of discussions with different, we received very fruitful
feedback and reworked the ideas behind this FLIP. Initially, we had
the impression that the compaction problem is solvable by a single
topology that we can reuse across different sinks. We now have a
better understanding that different external systems require different
compaction mechanism i.e. Hive requires compaction before finally
registering the file in the metastore or Iceberg compacts the files
after they have been registered and just lazily compacts them.

Considering all these different views we came up with a design that
builds upon what @[email protected] and @[email protected] have
proposed at the beginning. We allow inserting custom topologies before
and after the SinkWriters and Committers. Furthermore, we do not see
it as a downside. The Sink interfaces that will expose the DataStream
to the user reside in flink-streaming-java in contrast to the basic
Sink interfaces that reside fin flink-core deem it to be only used by
expert users.

Moreover, we also wanted to remove the global committer from the
unified Sink interfaces and replace it with a custom post-commit
topology. Unfortunately, we cannot do it without breaking the Sink
interface since the GlobalCommittables are part of the parameterized
Sink interface. Thus, we propose building a new Sink V2 interface
consisting of composable interfaces that do not offer the
GlobalCommitter anymore. We will implement a utility to extend a Sink
with post topology that mimics the behavior of the GlobalCommitter.
The new Sink V2 provides the same sort of methods as the Sink V1
interface, so a migration of sinks that do not use the GlobalCommitter
should be very easy.
We plan to keep the existing Sink V1 interfaces to not break
externally built sinks. As part of this FLIP, we migrate all the
connectors inside of the main repository to the new Sink V2 API.

The FLIP document is also updated and includes the proposed changes.

Looking forward to your feedback,
Fabian

https://cwiki.apache.org/confluence/display/FLINK/FLIP-191%3A+Extend+unified+Sink+interface+to+support+small+file+compaction


On Thu, Dec 2, 2021 at 10:15 AM Roman Khachatryan <[email protected]&gt; wrote:
&gt;
&gt; Thanks for clarifying (I was initially confused by merging state files
&gt; rather than output files).
&gt;
&gt; &gt; At some point, Flink will definitely have some WAL adapter that can 
turn any sink into an exactly-once sink (with some caveats). For now, we keep 
that as an orthogonal solution as it has a rather high price (bursty workload 
with high latency). Ideally, we can keep the compaction asynchronously...
&gt;
&gt; Yes, that would be something like a WAL. I agree that it would have a
&gt; different set of trade-offs.
&gt;
&gt;
&gt; Regards,
&gt; Roman
&gt;
&gt; On Mon, Nov 29, 2021 at 3:33 PM Arvid Heise <[email protected]&gt; wrote:
&gt; &gt;&gt;
&gt; &gt;&gt; &gt; One way to avoid write-read-merge is by wrapping SinkWriter 
with
&gt; &gt;&gt; &gt; another one, which would buffer input elements in a 
temporary storage
&gt; &gt;&gt; &gt; (e.g. local file) until a threshold is reached; after that, 
it would
&gt; &gt;&gt; &gt; invoke the original SinkWriter. And if a checkpoint barrier 
comes in
&gt; &gt;&gt; &gt; earlier, it would send written data to some aggregator.
&gt; &gt;&gt;
&gt; &gt;&gt; I think perhaps this seems to be a kind of WAL method? Namely we 
first
&gt; &gt;&gt; write the elements to some WAL logs and persist them on checkpoint
&gt; &gt;&gt; (in snapshot or remote FS), or we directly write WAL logs to the 
remote
&gt; &gt;&gt; FS eagerly.
&gt; &gt;&gt;
&gt; &gt; At some point, Flink will definitely have some WAL adapter that can 
turn any sink into an exactly-once sink (with some caveats). For now, we keep 
that as an orthogonal solution as it has a rather high price (bursty workload 
with high latency). Ideally, we can keep the compaction asynchronously...
&gt; &gt;
&gt; &gt; On Mon, Nov 29, 2021 at 8:52 AM Yun Gao 
<[email protected]&gt; wrote:
&gt; &gt;&gt;
&gt; &gt;&gt; Hi,
&gt; &gt;&gt;
&gt; &gt;&gt; @Roman very sorry for the late response for a long time,
&gt; &gt;&gt;
&gt; &gt;&gt; &gt; Merging artifacts from multiple checkpoints would apparently
&gt; &gt;&gt; require multiple concurrent checkpoints
&gt; &gt;&gt;
&gt; &gt;&gt; I think it might not need concurrent checkpoints: suppose some
&gt; &gt;&gt; operators (like the committer aggregator in the option 2) 
maintains
&gt; &gt;&gt; the list of files to merge, it could stores the lists of files to 
merge
&gt; &gt;&gt; in the states, then after several checkpoints are done and we have
&gt; &gt;&gt; enough files, we could merge all the files in the list.
&gt; &gt;&gt;
&gt; &gt;&gt; &gt; Asynchronous merging in an aggregator would require some 
resolution
&gt; &gt;&gt; &gt; logic on recovery, so that a merged artifact can be used if 
the
&gt; &gt;&gt; &gt; original one was deleted. Otherwise, wouldn't recovery fail 
because
&gt; &gt;&gt; &gt; some artifacts are missing?
&gt; &gt;&gt; &gt; We could also defer deletion until the "compacted" 
checkpoint is
&gt; &gt;&gt; &gt; subsumed - but isn't it too late, as it will be deleted 
anyways once
&gt; &gt;&gt; &gt; subsumed?
&gt; &gt;&gt;
&gt; &gt;&gt; I think logically we could delete the original files once the 
"compacted" checkpoint
&gt; &gt;&gt; (which finish merging the compacted files and record it in the 
checkpoint) is completed
&gt; &gt;&gt; in all the options. If there are failover before we it, we could 
restart the merging and if
&gt; &gt;&gt; there are failover after it, we could have already recorded the 
files in the checkpoint.
&gt; &gt;&gt;
&gt; &gt;&gt; &gt; One way to avoid write-read-merge is by wrapping SinkWriter 
with
&gt; &gt;&gt; &gt; another one, which would buffer input elements in a 
temporary storage
&gt; &gt;&gt; &gt; (e.g. local file) until a threshold is reached; after that, 
it would
&gt; &gt;&gt; &gt; invoke the original SinkWriter. And if a checkpoint barrier 
comes in
&gt; &gt;&gt; &gt; earlier, it would send written data to some aggregator.
&gt; &gt;&gt;
&gt; &gt;&gt; I think perhaps this seems to be a kind of WAL method? Namely we 
first
&gt; &gt;&gt; write the elements to some WAL logs and persist them on checkpoint
&gt; &gt;&gt; (in snapshot or remote FS), or we directly write WAL logs to the 
remote
&gt; &gt;&gt; FS eagerly.
&gt; &gt;&gt;
&gt; &gt;&gt; Sorry if I do not understand correctly somewhere.
&gt; &gt;&gt;
&gt; &gt;&gt; Best,
&gt; &gt;&gt; Yun
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt; ------------------------------------------------------------------
&gt; &gt;&gt; From:Roman Khachatryan <[email protected]&gt;
&gt; &gt;&gt; Send Time:2021 Nov. 9 (Tue.) 22:03
&gt; &gt;&gt; To:dev <[email protected]&gt;
&gt; &gt;&gt; Subject:Re: [DISCUSS] FLIP-191: Extend unified Sink interface to 
support small file compaction
&gt; &gt;&gt;
&gt; &gt;&gt; Hi everyone,
&gt; &gt;&gt;
&gt; &gt;&gt; Thanks for the proposal and the discussion, I have some remarks:
&gt; &gt;&gt; (I'm not very familiar with the new Sink API but I thought about 
the
&gt; &gt;&gt; same problem in context of the changelog state backend)
&gt; &gt;&gt;
&gt; &gt;&gt; 1. Merging artifacts from multiple checkpoints would apparently
&gt; &gt;&gt; require multiple concurrent checkpoints (otherwise, a new 
checkpoint
&gt; &gt;&gt; won't be started before completing the previous one; and the 
previous
&gt; &gt;&gt; one can't be completed before durably storing the artifacts). 
However,
&gt; &gt;&gt; concurrent checkpoints are currently not supported with Unaligned
&gt; &gt;&gt; checkpoints (this is besides increasing e2e-latency).
&gt; &gt;&gt;
&gt; &gt;&gt; 2. Asynchronous merging in an aggregator would require some 
resolution
&gt; &gt;&gt; logic on recovery, so that a merged artifact can be used if the
&gt; &gt;&gt; original one was deleted. Otherwise, wouldn't recovery fail 
because
&gt; &gt;&gt; some artifacts are missing?
&gt; &gt;&gt; We could also defer deletion until the "compacted" checkpoint is
&gt; &gt;&gt; subsumed - but isn't it too late, as it will be deleted anyways 
once
&gt; &gt;&gt; subsumed?
&gt; &gt;&gt;
&gt; &gt;&gt; 3. Writing small files, then reading and merging them for *every*
&gt; &gt;&gt; checkpoint seems worse than only reading them on recovery. I 
guess I'm
&gt; &gt;&gt; missing some cases of reading, so to me it would make sense to 
mention
&gt; &gt;&gt; these cases explicitly in the FLIP motivation section.
&gt; &gt;&gt;
&gt; &gt;&gt; 4. One way to avoid write-read-merge is by wrapping SinkWriter 
with
&gt; &gt;&gt; another one, which would buffer input elements in a temporary 
storage
&gt; &gt;&gt; (e.g. local file) until a threshold is reached; after that, it 
would
&gt; &gt;&gt; invoke the original SinkWriter. And if a checkpoint barrier comes 
in
&gt; &gt;&gt; earlier, it would send written data to some aggregator. It will
&gt; &gt;&gt; increase checkpoint delay (async phase) compared to the current 
Flink;
&gt; &gt;&gt; but not compared to the write-read-merge solution, IIUC.
&gt; &gt;&gt; Then such "BufferingSinkWriters" could aggregate input elements 
from
&gt; &gt;&gt; each other, potentially recursively (I mean something like
&gt; &gt;&gt; 
https://cwiki.apache.org/confluence/download/attachments/173082889/DSTL-DFS-DAG.png
&gt; &gt;&gt; )
&gt; &gt;&gt;
&gt; &gt;&gt; 5. Reducing the number of files by reducing aggregator 
parallelism as
&gt; &gt;&gt; opposed to merging on reaching size threshold will likely be less
&gt; &gt;&gt; optimal and more difficult to configure. OTH, thresholds might be 
more
&gt; &gt;&gt; difficult to implement and (with recursive merging) would incur 
higher
&gt; &gt;&gt; latency. Maybe that's also something to decide explicitly or at 
least
&gt; &gt;&gt; mention in the FLIP.
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt; Regards,
&gt; &gt;&gt; Roman
&gt; &gt;&gt;
&gt; &gt;&gt;
&gt; &gt;&gt; On Tue, Nov 9, 2021 at 5:23 AM Reo Lei <[email protected]&gt; 
wrote:
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; Hi Fabian,
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; Thanks for drafting the FLIP and trying to support small 
file compaction. I
&gt; &gt;&gt; &gt; think this feature is very urgent and valuable for users(at 
least for me).
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; Currently I am trying to support streaming rewrite(compact) 
for Iceberg on
&gt; &gt;&gt; &gt; PR#3323 <https://github.com/apache/iceberg/pull/3323&gt;. As 
Steven mentioned,
&gt; &gt;&gt; &gt; Iceberg sink and compact data through the following steps:
&gt; &gt;&gt; &gt; Step-1: Some parallel data writer(sinker) to write streaming 
data as files.
&gt; &gt;&gt; &gt; Step-2: A single parallelism data files committer to commit 
the completed
&gt; &gt;&gt; &gt; files as soon as possible to make them available.
&gt; &gt;&gt; &gt; Step-3: Some parallel file rewriter(compactor) to collect 
committed files
&gt; &gt;&gt; &gt; from multiple checkpoints, and rewriter(compact) them 
together once the
&gt; &gt;&gt; &gt; total file size or number of files reach the threshold.
&gt; &gt;&gt; &gt; Step-4: A single parallelism rewrite(compact) result 
committer to commit
&gt; &gt;&gt; &gt; the rewritten(compacted) files to replace the old files and 
make them
&gt; &gt;&gt; &gt; available.
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; If Flink want to support small file compaction, some key 
point I think is
&gt; &gt;&gt; &gt; necessary:
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; 1, Compact files from multiple checkpoints.
&gt; &gt;&gt; &gt; I totally agree with Jingsong, because completed file size 
usually could
&gt; &gt;&gt; &gt; not reach the threshold in a single checkpoint. Especially 
for partitioned
&gt; &gt;&gt; &gt; table, we need to compact the files of each partition, but 
usually the file
&gt; &gt;&gt; &gt; size of each partition will be different and may not reach 
the merge
&gt; &gt;&gt; &gt; threshold. If we compact these files, in a single 
checkpoint, regardless of
&gt; &gt;&gt; &gt; whether the total file size reaches the threshold, then the 
value of
&gt; &gt;&gt; &gt; compacting will be diminished and we will still get small 
files because
&gt; &gt;&gt; &gt; these compacted files are not reach to target size. So we 
need the
&gt; &gt;&gt; &gt; compactor to collect committed files from multiple 
checkpoints and compact
&gt; &gt;&gt; &gt; them until they reach the threshold.
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; 2, Separate write phase and compact phase.
&gt; &gt;&gt; &gt; Users usually hope the data becomes available as soon as 
possible, and the
&gt; &gt;&gt; &gt;&nbsp; end-to-end latency is very important. I think we need 
to separate the
&gt; &gt;&gt; &gt; write and compact phase. For the write phase, there include 
the Step-1
&gt; &gt;&gt; &gt; and Step-2, we sink data as file and commit it pre 
checkpoint and regardless
&gt; &gt;&gt; &gt; of whether the file size it is. That could ensure the data 
will be
&gt; &gt;&gt; &gt; available ASAP. For the compact phase, there include the 
Step-3
&gt; &gt;&gt; &gt; and Step-4,&nbsp; the compactor should collect committed 
files from multiple
&gt; &gt;&gt; &gt; checkpoints and compact them asynchronously once they reach 
the threshold,
&gt; &gt;&gt; &gt; and the compact committer will commit the&nbsp; compaction 
result in the next
&gt; &gt;&gt; &gt; checkpoint. We compact the committed files asynchronously 
because we don't
&gt; &gt;&gt; &gt; want the compaction to affect the data sink or the whole 
pipeline.
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; 3, Exactly once guarantee between write and compact phase.
&gt; &gt;&gt; &gt; Once we separate write phase and compact phase, we need to 
consider
&gt; &gt;&gt; &gt; how to guarantee
&gt; &gt;&gt; &gt; the exact once semantic between two phases. We should not 
lose any data or
&gt; &gt;&gt; &gt; files on the compactor(Step-3) in any case and cause the 
compaction result
&gt; &gt;&gt; &gt; to be inconsistent with before. I think flink should provide 
an easy-to-use
&gt; &gt;&gt; &gt; interface to make that easier.
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; 4, Metadata operation and&nbsp; compaction result validation.
&gt; &gt;&gt; &gt; In the compact phase, there may be not only compact files, 
but also a lot
&gt; &gt;&gt; &gt; of metadata operations, such as the iceberg needing to 
read/write manifest
&gt; &gt;&gt; &gt; and do MOR. And we need some interface to support users to 
do some
&gt; &gt;&gt; &gt; validation of the compaction result. I think these points 
should be
&gt; &gt;&gt; &gt; considered when we design the compaction API.
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; Back to FLIP-191, option 1 looks very complicated while 
option 2 is
&gt; &gt;&gt; &gt; relatively simple, but neither of these two solutions 
separates the write
&gt; &gt;&gt; &gt; phase from the compact phase. So I think we should consider 
the points I
&gt; &gt;&gt; &gt; mentioned above. And if you have any other questions you can 
always feel
&gt; &gt;&gt; &gt; free to reach out to me!
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; BR,
&gt; &gt;&gt; &gt; Reo
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; Fabian Paul <[email protected]&gt; 
??2021??11??8?????? ????7:59??????
&gt; &gt;&gt; &gt;
&gt; &gt;&gt; &gt; &gt; Hi all,
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; Thanks for the lively discussions. I am really excited 
to see so many
&gt; &gt;&gt; &gt; &gt; people
&gt; &gt;&gt; &gt; &gt; participating in this thread. It also underlines the 
need that many people
&gt; &gt;&gt; &gt; &gt; would
&gt; &gt;&gt; &gt; &gt; like to see a solution soon.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; I have updated the FLIP and removed the parallelism 
configuration because
&gt; &gt;&gt; &gt; &gt; it is
&gt; &gt;&gt; &gt; &gt; unnecessary since users can configure a constant 
exchange key to send all
&gt; &gt;&gt; &gt; &gt; committables to only one committable aggregator.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; 1. Burden for developers w.r.t batch stream unification.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; @yun @guowei, from a theoretical point you are right 
about exposing the
&gt; &gt;&gt; &gt; &gt; DataStream
&gt; &gt;&gt; &gt; &gt; API in the sink users have the full power to write 
correct batch and
&gt; &gt;&gt; &gt; &gt; streaming
&gt; &gt;&gt; &gt; &gt; sinks. I think in reality a lot of users still struggle 
to build pipelines
&gt; &gt;&gt; &gt; &gt; with
&gt; &gt;&gt; &gt; &gt; i.e. the operator pipeline which works correct in 
streaming and batch mode.
&gt; &gt;&gt; &gt; &gt; Another problem I see is by exposing more deeper 
concepts is that we
&gt; &gt;&gt; &gt; &gt; cannot do
&gt; &gt;&gt; &gt; &gt; any optimization because we cannot reason about how 
sinks are built in the
&gt; &gt;&gt; &gt; &gt; future.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; We should also try to steer users towards using only 
`Functions` to give
&gt; &gt;&gt; &gt; &gt; us more
&gt; &gt;&gt; &gt; &gt; flexibility to swap the internal operator 
representation. I agree with
&gt; &gt;&gt; &gt; &gt; @yun we
&gt; &gt;&gt; &gt; &gt; should try to make the `ProcessFunction` more versatile 
to work on that
&gt; &gt;&gt; &gt; &gt; goal but
&gt; &gt;&gt; &gt; &gt; I see this as unrelated to the FLIP.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; 2. Regarding Commit / Global commit
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; I envision the global committer to be specific 
depending on the data lake
&gt; &gt;&gt; &gt; &gt; solution you want to write to. However, it is entirely 
orthogonal to the
&gt; &gt;&gt; &gt; &gt; compaction.
&gt; &gt;&gt; &gt; &gt; Currently, I do not expect any changes w.r.t the Global 
commit introduces
&gt; &gt;&gt; &gt; &gt; by
&gt; &gt;&gt; &gt; &gt; this FLIP.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; 3. Regarding the case of trans-checkpoints merging
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; @yun, as user, I would expect that if the committer 
receives in a
&gt; &gt;&gt; &gt; &gt; checkpoint files
&gt; &gt;&gt; &gt; &gt; to merge/commit that these are also finished when the 
checkpoint finishes.
&gt; &gt;&gt; &gt; &gt; I think all sinks rely on this principle currently 
i.e., KafkaSink needs to
&gt; &gt;&gt; &gt; &gt; commit all open transactions until the next checkpoint 
can happen.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; Maybe in the future, we can somehow move the 
Committer#commit call to an
&gt; &gt;&gt; &gt; &gt; asynchronous execution, but we should discuss it as a 
separate thread.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; &gt; We probably should first describe the different 
causes of small files and
&gt; &gt;&gt; &gt; &gt; &gt; what problems was this proposal trying to solve. I 
wrote a data shuffling
&gt; &gt;&gt; &gt; &gt; &gt; proposal [1] for Flink Iceberg sink (shared with 
Iceberg community [2]).
&gt; &gt;&gt; &gt; &gt; It
&gt; &gt;&gt; &gt; &gt; &gt; can address small files problems due to skewed 
data distribution across
&gt; &gt;&gt; &gt; &gt; &gt; Iceberg table partitions. Streaming shuffling 
before writers (to files)
&gt; &gt;&gt; &gt; &gt; is
&gt; &gt;&gt; &gt; &gt; &gt; typically more efficient than post-write file 
compaction (which involves
&gt; &gt;&gt; &gt; &gt; &gt; read-merge-write). It is usually cheaper to 
prevent a problem (small
&gt; &gt;&gt; &gt; &gt; files)
&gt; &gt;&gt; &gt; &gt; &gt; than fixing it.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; @steven you are raising a good point, although I think 
only using a
&gt; &gt;&gt; &gt; &gt; customizable
&gt; &gt;&gt; &gt; &gt; shuffle won't address the generation of small files. 
One assumption is that
&gt; &gt;&gt; &gt; &gt; at least the sink generates one file per subtask, which 
can already be too
&gt; &gt;&gt; &gt; &gt; many.
&gt; &gt;&gt; &gt; &gt; Another problem is that with low checkpointing 
intervals, the files do not
&gt; &gt;&gt; &gt; &gt; meet
&gt; &gt;&gt; &gt; &gt; the required size. The latter point is probably 
addressable by changing the
&gt; &gt;&gt; &gt; &gt; checkpoint interval, which might be inconvenient for 
some users.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; &gt; The sink coordinator checkpoint problem (mentioned 
in option 1) would be
&gt; &gt;&gt; &gt; &gt; &gt; great if Flink can address it. In the spirit of 
source
&gt; &gt;&gt; &gt; &gt; (enumerator-reader)
&gt; &gt;&gt; &gt; &gt; &gt; and sink (writer-coordinator) duality, sink 
coordinator checkpoint should
&gt; &gt;&gt; &gt; &gt; &gt; happen after the writer operator. This would be a 
natural fit to support
&gt; &gt;&gt; &gt; &gt; &gt; global committer in FLIP-143. It is probably an 
orthogonal matter to this
&gt; &gt;&gt; &gt; &gt; &gt; proposal.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; To me the question here is what are the benefits of 
having a coordinator in
&gt; &gt;&gt; &gt; &gt; comparison to a global committer/aggregator operator.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; &gt; Personally, I am usually in favor of keeping 
streaming ingestion (to data
&gt; &gt;&gt; &gt; &gt; &gt; lake) relatively simple and stable. Also sometimes 
compaction and sorting
&gt; &gt;&gt; &gt; &gt; &gt; are performed together in data rewrite maintenance 
jobs to improve read
&gt; &gt;&gt; &gt; &gt; &gt; performance. In that case, the value of compacting 
(in Flink streaming
&gt; &gt;&gt; &gt; &gt; &gt; ingestion) diminishes.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; I agree it is always possible to have scheduled 
maintenance jobs keeping
&gt; &gt;&gt; &gt; &gt; care of
&gt; &gt;&gt; &gt; &gt; your data i.e., doing compaction. Unfortunately, the 
downside is that you
&gt; &gt;&gt; &gt; &gt; have to your data after it is already available for 
other downstream
&gt; &gt;&gt; &gt; &gt; consumers.
&gt; &gt;&gt; &gt; &gt; I guess this can lead to all kinds of visibility 
problems. I am also
&gt; &gt;&gt; &gt; &gt; surprised that
&gt; &gt;&gt; &gt; &gt; you personally are a fan of this approach and, on the 
other hand, are
&gt; &gt;&gt; &gt; &gt; developing
&gt; &gt;&gt; &gt; &gt; the Iceberg sink, which goes somewhat against your 
mentioned principle of
&gt; &gt;&gt; &gt; &gt; keeping
&gt; &gt;&gt; &gt; &gt; the sink simple.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; &gt; Currently, it is unclear from the doc and this 
thread where the
&gt; &gt;&gt; &gt; &gt; compaction
&gt; &gt;&gt; &gt; &gt; &gt; is actually happening. Jingsong's reply described 
one model
&gt; &gt;&gt; &gt; &gt; &gt; writer (parallel) -&gt; aggregator 
(single-parallelism compaction planner)
&gt; &gt;&gt; &gt; &gt; -&gt;
&gt; &gt;&gt; &gt; &gt; &gt; compactor (parallel) -&gt; global committer 
(single-parallelism)
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; My idea of the topology is very similar to the one 
outlined by Jinsong. The
&gt; &gt;&gt; &gt; &gt; compaction will happen in the committer operator.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; &gt; In the Iceberg community, the following model has 
been discussed. It is
&gt; &gt;&gt; &gt; &gt; &gt; better for Iceberg because it won't delay the data 
availability.
&gt; &gt;&gt; &gt; &gt; &gt; writer (parallel) -&gt; global committer for 
append (single parallelism) -&gt;
&gt; &gt;&gt; &gt; &gt; &gt; compactor (parallel) -&gt; global committer for 
rewrite commit (single
&gt; &gt;&gt; &gt; &gt; &gt; parallelism)
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; From a quick glimpse, it seems that the exact same 
topology is possible to
&gt; &gt;&gt; &gt; &gt; express with the committable aggregator, but this 
definitely depends on
&gt; &gt;&gt; &gt; &gt; the exact
&gt; &gt;&gt; &gt; &gt; setup.
&gt; &gt;&gt; &gt; &gt;
&gt; &gt;&gt; &gt; &gt; Best,
&gt; &gt;&gt; &gt; &gt; Fabian
&gt; &gt;&gt;

?????? [DISCUSS] FLIP-191: Extend unified Sink interface to support small file compaction

Reply via email to