Thanks for your comments. The download url zk Metadata format change is
part of the deepstore by-pass proposal. The design aims to improve on
status quo for two cases w.r.t deep store config:

   1. A deep store configured for Pinot cluster
   2. No deep store configured for Pinot -- not for heavy users of Pinot
   but still valuable for many small/medium size setup as seen from slack
   questions.

For case 2, because there is no deep store URI, there is a need for a new
download URI format in zkMetaData.
For case 1, our design proposal of controller and server interaction in
case of deep store outage is similar to your proposal. In particular,

   1. When the selected commit server fails to upload a segment *S* to the
   deep store, it will pass a special uri *U* to the controller.
   2. The controller will then save *U *in the down url of a segment
   metadata.
   3. When another server needs to download *S, *if it sees *U *it
   directly downloads from peer servers. Otherwise it first tries to download
   from deep store and if fails trying to download from peer servers.

Note that the above steps work for both case 1 and case 2.

To me, your proposal simplifies the logic above in step 3 assuming deep
store is configured. I.e., everytime the server needs to download *S, *it
always downloads from deepstore first if it fails, downloads *S* from peer.
My proposal adds a check from the uri format but overall it works with both
cases regardless if the deep store is configured or not.

As for segment download cost from peer servers, I agree it is expensive and
should be avoided for high traffic clusters. But it is not the main focus
of this discussion.


On Thu, Jun 11, 2020 at 4:18 PM Xiaotian Jiang <[email protected]> wrote:

> Let me elaborate more on my proposal:
>
> The problem we are trying to solve here is when deep storage has lower
> availability than Pinot, we should be able to leverage the segment
> replications on Pinot server to increase the availability of segment
> download.
> Here we should first try to download from deep storage, and only if it
> fails, we use peer download as the backup.
> It is possible that the deep storage URL does not exist because the
> segment upload failed, in which case we should download from the
> peers.
> Notice that in both cases, peer download should be modeled as a backup
> plan instead of the main way of downloading segments.
> Also, downloading segments from a server requires the server to
> compress and send the segments, which is not a cheap operation, and
> can cause performance impact on the server. So we should only use peer
> download if there is no other option, i.e. as a backup.
> If we model peer download as a backup plan, we should not overload the
> existing downloadUri to trigger it. Instead, we should try to download
> with the downloadUri first, and only if it fails (including the case
> where downloadUri does not exist), we try to download from peer.
>
> On Thu, Jun 11, 2020 at 3:26 PM TING CHEN <[email protected]>
> wrote:
> >
> > Our current design
> > <
> https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion
> >
> > does not add a new pinotFS for *peer://*. This is a very interesting
> > question though. The only operations our design uses today for peer "FS"
> is
> > essentially *copyToLocal()* in the pinotFS interface. Our design
> basically
> > has a class and a few supporting methods
> > <
> https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion#BypassingdeepstorerequirementforRealtimesegmentcompletion-EnablebesteffortsegmentuploadinSplitSegmentCommiteranddownloadsegmentfrompeerservers
> .>
> > implementing the above method. That is why we do not add a full-fledged
> > pinotFS subclass for this design.
> >
> >
> > On Thu, Jun 11, 2020 at 3:00 PM kishore g <[email protected]> wrote:
> >
> > > Also, will peer:// have an implementation of pinotFS ?
> > >
> > > On Thu, Jun 11, 2020 at 2:58 PM kishore g <[email protected]> wrote:
> > >
> > > > +1 peer.
> > > >
> > > > unrelated to this - do we support multiple URI's?
> > > >
> > > > On Thu, Jun 11, 2020 at 2:51 PM Subbu Subramaniam <
> [email protected]>
> > > > wrote:
> > > >
> > > >> Hey Ting,
> > > >>
> > > >> I like the URI in metadata as "peer:///segmentName". This way, the
> URI
> > > >> remains parsable and we can use the scheme to check for a segment
> > > fetcher,
> > > >>
> > > >> thanks
> > > >>
> > > >> -Subbu
> > > >>
> > > >> On 2020/06/10 01:09:25, TING CHEN <[email protected]>
> wrote:
> > > >> > As part of the deep store by-passing
> > > >> > <
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion
> > > >> >
> > > >> > project, a server is allowed to download segments from peer Pinot
> > > >> servers
> > > >> > during Low Level Consumer (LLC) instead of deep segment stores. To
> > > >> enable
> > > >> > this feature, we plan to add a new URI format for the field
> > > >> > *segment.realtime.download.url
> > > >> > *in LLCRealtimeSegmentZKMetadata.
> > > >> >
> > > >> > The new URI format serves the purpose of instructing a Pinot
> server to
> > > >> find
> > > >> > and download the segment from a peer server. Controller writes it
> to
> > > >> Helix
> > > >> > in case of segment upload failure or no deep store configured at
> all.
> > > We
> > > >> > proposed the following format options and want to hear your
> feedback:
> > > >> >
> > > >> >    1. peer:///segmentName; (my preference)
> > > >> >    2. simply an empty string *''*
> > > >> >
> > > >> > Both are in essence specially markers to indicate that the
> segment is
> > > >> not
> > > >> > found in deep store and servers have to download them from peer
> > > servers.
> > > >> > (1) has the benefit of better readability than (2) for debugging
> > > >> purposes.
> > > >> >
> > > >> > Please let me know what you think.
> > > >> >
> > > >> > Ting Chen
> > > >> >
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: [email protected]
> > > >> For additional commands, e-mail: [email protected]
> > > >>
> > > >>
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to