Re: [DISCUSS] KIP-405: Kafka Tiered Storage

Ying Zheng Tue, 19 Nov 2019 18:49:00 -0800

Please ignore my previous email
I didn't know Apache requires all the discussions to be "open"


On Tue, Nov 19, 2019, 5:40 PM Ying Zheng <yi...@uber.com> wrote:

> Hi Jun,
>
> Thank you very much for your feedback!
>
> Can we schedule a meeting in your Palo Alto office in December? I think a
> face to face discussion is much more efficient than emails. Both Harsha and
> I can visit you. Satish may be able to join us remotely.
>
> On Fri, Nov 15, 2019 at 11:04 AM Jun Rao <j...@confluent.io> wrote:
>
>> Hi, Satish and Harsha,
>>
>> The following is a more detailed high level feedback for the KIP. Overall,
>> the KIP seems useful. The challenge is how to design it such that it’s
>> general enough to support different ways of implementing this feature and
>> support existing features.
>>
>> 40. Local segment metadata storage: The KIP makes the assumption that the
>> metadata for the archived log segments are cached locally in every broker
>> and provides a specific implementation for the local storage in the
>> framework. We probably should discuss this more. For example, some tier
>> storage providers may not want to cache the metadata locally and just rely
>> upon a remote key/value store if such a store is already present. If a
>> local store is used, there could be different ways of implementing it
>> (e.g., based on customized local files, an embedded local store like
>> RocksDB, etc). An alternative of designing this is to just provide an
>> interface for retrieving the tier segment metadata and leave the details
>> of
>> how to get the metadata outside of the framework.
>>
>> 41. RemoteStorageManager interface and the usage of the interface in the
>> framework: I am not sure if the interface is general enough.  For example,
>> it seems that RemoteLogIndexEntry is tied to a specific way of storing the
>> metadata in remote storage. The framework uses listRemoteSegments() api in
>> a pull based approach. However, in some other implementations, a push
>> based
>> approach may be more preferred. I don’t have a concrete proposal yet. But,
>> it would be useful to give this area some more thoughts and see if we can
>> make the interface more general.
>>
>> 42. In the diagram, the RemoteLogManager is side by side with LogManager.
>> This KIP only discussed how the fetch request is handled between the two
>> layer. However, we should also consider how other requests that touch the
>> log can be handled. e.g., list offsets by timestamp, delete records, etc.
>> Also, in this model, it's not clear which component is responsible for
>> managing the log start offset. It seems that the log start offset could be
>> changed by both RemoteLogManager and LogManager.
>>
>> 43. There are quite a few existing features not covered by the KIP. It
>> would be useful to discuss each of those.
>> 43.1 I won’t say that compacted topics are rarely used and always small.
>> For example, KStreams uses compacted topics for storing the states and
>> sometimes the size of the topic could be large. While it might be ok to
>> not
>> support compacted topics initially, it would be useful to have a high
>> level
>> idea on how this might be supported down the road so that we don’t have to
>> make incompatible API changes in the future.
>> 43.2 We need to discuss how EOS is supported. In particular, how is the
>> producer state integrated with the remote storage.
>> 43.3 Now that KIP-392 (allow consumers to fetch from closest replica) is
>> implemented, we need to discuss how reading from a follower replica is
>> supported with tier storage.
>> 43.4 We need to discuss how JBOD is supported with tier storage.
>>
>> Thanks,
>>
>> Jun
>>
>> On Fri, Nov 8, 2019 at 12:06 AM Tom Bentley <tbent...@redhat.com> wrote:
>>
>> > Thanks for those insights Ying.
>> >
>> > On Thu, Nov 7, 2019 at 9:26 PM Ying Zheng <yi...@uber.com.invalid>
>> wrote:
>> >
>> > > >
>> > > >
>> > > >
>> > > > Thanks, I missed that point. However, there's still a point at which
>> > the
>> > > > consumer fetches start getting served from remote storage (even if
>> that
>> > > > point isn't as soon as the local log retention time/size). This
>> > > represents
>> > > > a kind of performance cliff edge and what I'm really interested in
>> is
>> > how
>> > > > easy it is for a consumer which falls off that cliff to catch up
>> and so
>> > > its
>> > > > fetches again come from local storage. Obviously this can depend on
>> all
>> > > > sorts of factors (like production rate, consumption rate), so it's
>> not
>> > > > guaranteed (just like it's not guaranteed for Kafka today), but this
>> > > would
>> > > > represent a new failure mode.
>> > > >
>> > >
>> > >  As I have explained in the last mail, it's a very rare case that a
>> > > consumer
>> > > need to read remote data. With our experience at Uber, this only
>> happens
>> > > when the consumer service had an outage for several hours.
>> > >
>> > > There is not a "performance cliff" as you assume. The remote storage
>> is
>> > > even faster than local disks in terms of bandwidth. Reading from
>> remote
>> > > storage is going to have higher latency than local disk. But since the
>> > > consumer
>> > > is catching up several hours data, it's not sensitive to the
>> sub-second
>> > > level
>> > > latency, and each remote read request will read a large amount of
>> data to
>> > > make the overall performance better than reading from local disks.
>> > >
>> > >
>> > >
>> > > > Another aspect I'd like to understand better is the effect of
>> serving
>> > > fetch
>> > > > request from remote storage has on the broker's network
>> utilization. If
>> > > > we're just trimming the amount of data held locally (without
>> increasing
>> > > the
>> > > > overall local+remote retention), then we're effectively trading disk
>> > > > bandwidth for network bandwidth when serving fetch requests from
>> remote
>> > > > storage (which I understand to be a good thing, since brokers are
>> > > > often/usually disk bound). But if we're increasing the overall
>> > > local+remote
>> > > > retention then it's more likely that network itself becomes the
>> > > bottleneck.
>> > > > I appreciate this is all rather hand wavy, I'm just trying to
>> > understand
>> > > > how this would affect broker performance, so I'd be grateful for any
>> > > > insights you can offer.
>> > > >
>> > > >
>> > > Network bandwidth is a function of produce speed, it has nothing to do
>> > with
>> > > remote retention. As long as the data is shipped to remote storage,
>> you
>> > can
>> > > keep the data there for 1 day or 1 year or 100 years, it doesn't
>> consume
>> > > any
>> > > network resources.
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-405: Kafka Tiered Storage

Reply via email to