On Mon, May 18, 2020 at 5:24 PM Seunghyun Lee <sn...@apache.org> wrote:

> If the final goal is to make this as the server level configuration (and
> remove the table level in the future), I recommend adding this config
> inside of `StreamConfig` because it will be much easier to remove the
> config from StreamConfig because it's a map.
>
> As Subbu mentioned, we may need to put it in the table config if this
> feature needs to work for both offline/realtime.
>
> @Ting Is this feature will only be used for realtime?
>

Initially yes, the peer downloading feature will only be used in real-time
ingestion. Nothing prevents offline segment loading to use peer downloading
though (with proper design overcoming a few cases raised by @mcvsubbu
<mcvsu...@apache.org>) in future. So putting it in
SegmentsValidationAndRetentionConfig could be future proof -- deprecating a
config is not ideal but acceptable in this sense.

The current design does not address much on the offline segment story. But
overall, the benefits for offline tables are similar in the sense that
segments can still be downloadable when the deep store is unavailable for
some time period.



>
>
> On Mon, May 11, 2020 at 11:10 AM Subbu Subramaniam <mcvsu...@apache.org>
> wrote:
>
> > The goal of the config change in [arts other than StreamConfig is to
> > ensure that the config can be used for offline segments as well, if for
> > some reason download fails.
> >
> > However, a race condition can happen for offline segments that can be
> > dangerous. If a segment has been updated with a newer version, and
> server A
> > and B have old versions. Both of them get notified of the newer version.
> > They may try to fetch the segment and fail, and eventually fetch from
> each
> > other, and end up thinking that they have the newest version of the
> > segment.There can be other variants of this as well, with restarts of
> > server.
> >
> > I can think of some ways to fix this (e.g. in the segment update message,
> > send the crc of the new version), but these have not been fully thought
> of.
> > We need to vet these well before adopting these.
> >
> > I prefer option 1 since it introduces a single config.
> >
> > I would like to hear from @kishoreg and @npawar as well
> >
> > -Subbu
> >
> > On 2020/05/06 00:16:26, TING CHEN <tingc...@uber.com.INVALID> wrote:
> > > As part of the proposal
> > > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_PINOT_By-2Dpassing-2Bdeep-2Dstore-2Brequirement-2Bfor-2BRealtime-2Bsegment-2Bcompletion&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY&m=2RLFABXWvcW1-9iPo8za5jQ05gjroffzi6X6Fwv193s&s=4PDasxjke_lypTn76s0wILlODFSY0ptfoHSu0cyCCxg&e=
> > >
> > > to bypass deep store for segment completion, I plan to add a new
> optional
> > > string field *peerSegmentDownloadScheme* to
> > > the SegmentsValidationAndRetentionConfig in the TableConfig. The value
> > can
> > > be *http* or *https*.
> > >
> > >    1. SplitSegmentCommitter
> > >    <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dpinot_blob_31c55afdb6a40f98189308ce6292587ead9d0dec_pinot-2Dcore_src_main_java_org_apache_pinot_core_data_manager_realtime_SplitSegmentCommitter.java&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY&m=2RLFABXWvcW1-9iPo8za5jQ05gjroffzi6X6Fwv193s&s=PwR38isPLlQEuKjFAEM6Ww3w1LD8_Goe3VbWiNSlxhc&e=
> > >
> > >    will check this value. If it exists, the segment committer will be
> > able to
> > >    finish segment commit successfully even if the upload to the segment
> > store
> > >    fails. The committer will report a special marker to the controller
> > about
> > >    the segment is available in peer servers.
> > >    2. When Pinot servers fail to download segments from the segment
> > store,
> > >    they can also check this field's value. If it exists, it can
> download
> > >    segments from peer servers using either HTTP and HTTPS segment
> > fetchers as
> > >    configured. (related PR
> > >    <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dpinot_pull_5336&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY&m=2RLFABXWvcW1-9iPo8za5jQ05gjroffzi6X6Fwv193s&s=vdR7nAjvQq715f8CQB2FRdCKW5vekmx5wF6D2moT2VE&e=
> > in review for
> > how
> > >    to discover such servers.)
> > >
> > > Note this is a table level config. We will test the new download
> behavior
> > > in realtime tables in incremental fashion. Once fully proven, this
> config
> > > can be upgraded to server level config.
> > >
> > > Please let me know if you have any questions on this. Thanks @mcvsubbu
> > for
> > > coming up with the idea and offline discussions.
> > >
> > > Ting Chen
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> > For additional commands, e-mail: dev-h...@pinot.apache.org
> >
> >
>

Reply via email to