OK, I think I see jackie's point. An empty URI simply means that we could not 
get it to store in deep store reliably. Architecturally, that makes sense. If a 
SegmentFetcher wants to fetch the segment and finds empty URI, then it needs to 
look at servers with ExrernalView being ONLINE for the segment and fetch it 
from there.

So, let us do this.
(1) The segment completion protoocol carries a URI as proposed by Ting. (We 
should not be able to distinguish between an invalid request that may be 
missing a URI vs a request that truly says that the URI is only with peers at 
this point in time).
(2) Controller commits the metadata with an empty URI if it sees a "peer" 
scheme in the committing segment URI
(3) Segment downloader assumes that an empty URI == download from peer.

The con side of this is that we need to check everywhere to make sure that we 
are handling null URI in the metadata. It is hard to catch this in review, so i 
would ask Ting to do the due diligence here, and also have a couple of 
reviewers just in case.

Regarding segment download, I have a proposal:
A. Update the SegmentFetcher interface to introduce a new method that takes a 
list of URIs instead of one URI. Default impl for this method can be to call 
the older API with any of the URIs
B. Modify the http segment fetcher to try a random URL if a list is given (or, 
cycle through, or whatever).

This makes Ting;s code simpler on the caller side

-Subbu

On 2020/06/12 05:26:41, TING CHEN <tingc...@uber.com.INVALID> wrote: 
> Thanks for your comments. The download url zk Metadata format change is
> part of the deepstore by-pass proposal. The design aims to improve on
> status quo for two cases w.r.t deep store config:
> 
>    1. A deep store configured for Pinot cluster
>    2. No deep store configured for Pinot -- not for heavy users of Pinot
>    but still valuable for many small/medium size setup as seen from slack
>    questions.
> 
> For case 2, because there is no deep store URI, there is a need for a new
> download URI format in zkMetaData.
> For case 1, our design proposal of controller and server interaction in
> case of deep store outage is similar to your proposal. In particular,
> 
>    1. When the selected commit server fails to upload a segment *S* to the
>    deep store, it will pass a special uri *U* to the controller.
>    2. The controller will then save *U *in the down url of a segment
>    metadata.
>    3. When another server needs to download *S, *if it sees *U *it
>    directly downloads from peer servers. Otherwise it first tries to download
>    from deep store and if fails trying to download from peer servers.
> 
> Note that the above steps work for both case 1 and case 2.
> 
> To me, your proposal simplifies the logic above in step 3 assuming deep
> store is configured. I.e., everytime the server needs to download *S, *it
> always downloads from deepstore first if it fails, downloads *S* from peer.
> My proposal adds a check from the uri format but overall it works with both
> cases regardless if the deep store is configured or not.
> 
> As for segment download cost from peer servers, I agree it is expensive and
> should be avoided for high traffic clusters. But it is not the main focus
> of this discussion.
> 
> 
> On Thu, Jun 11, 2020 at 4:18 PM Xiaotian Jiang <jackie....@gmail.com> wrote:
> 
> > Let me elaborate more on my proposal:
> >
> > The problem we are trying to solve here is when deep storage has lower
> > availability than Pinot, we should be able to leverage the segment
> > replications on Pinot server to increase the availability of segment
> > download.
> > Here we should first try to download from deep storage, and only if it
> > fails, we use peer download as the backup.
> > It is possible that the deep storage URL does not exist because the
> > segment upload failed, in which case we should download from the
> > peers.
> > Notice that in both cases, peer download should be modeled as a backup
> > plan instead of the main way of downloading segments.
> > Also, downloading segments from a server requires the server to
> > compress and send the segments, which is not a cheap operation, and
> > can cause performance impact on the server. So we should only use peer
> > download if there is no other option, i.e. as a backup.
> > If we model peer download as a backup plan, we should not overload the
> > existing downloadUri to trigger it. Instead, we should try to download
> > with the downloadUri first, and only if it fails (including the case
> > where downloadUri does not exist), we try to download from peer.
> >
> > On Thu, Jun 11, 2020 at 3:26 PM TING CHEN <tingc...@uber.com.invalid>
> > wrote:
> > >
> > > Our current design
> > > <
> > https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion
> > >
> > > does not add a new pinotFS for *peer://*. This is a very interesting
> > > question though. The only operations our design uses today for peer "FS"
> > is
> > > essentially *copyToLocal()* in the pinotFS interface. Our design
> > basically
> > > has a class and a few supporting methods
> > > <
> > https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion#BypassingdeepstorerequirementforRealtimesegmentcompletion-EnablebesteffortsegmentuploadinSplitSegmentCommiteranddownloadsegmentfrompeerservers
> > .>
> > > implementing the above method. That is why we do not add a full-fledged
> > > pinotFS subclass for this design.
> > >
> > >
> > > On Thu, Jun 11, 2020 at 3:00 PM kishore g <g.kish...@gmail.com> wrote:
> > >
> > > > Also, will peer:// have an implementation of pinotFS ?
> > > >
> > > > On Thu, Jun 11, 2020 at 2:58 PM kishore g <g.kish...@gmail.com> wrote:
> > > >
> > > > > +1 peer.
> > > > >
> > > > > unrelated to this - do we support multiple URI's?
> > > > >
> > > > > On Thu, Jun 11, 2020 at 2:51 PM Subbu Subramaniam <
> > mcvsu...@apache.org>
> > > > > wrote:
> > > > >
> > > > >> Hey Ting,
> > > > >>
> > > > >> I like the URI in metadata as "peer:///segmentName". This way, the
> > URI
> > > > >> remains parsable and we can use the scheme to check for a segment
> > > > fetcher,
> > > > >>
> > > > >> thanks
> > > > >>
> > > > >> -Subbu
> > > > >>
> > > > >> On 2020/06/10 01:09:25, TING CHEN <tingc...@uber.com.INVALID>
> > wrote:
> > > > >> > As part of the deep store by-passing
> > > > >> > <
> > > > >>
> > > >
> > https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion
> > > > >> >
> > > > >> > project, a server is allowed to download segments from peer Pinot
> > > > >> servers
> > > > >> > during Low Level Consumer (LLC) instead of deep segment stores. To
> > > > >> enable
> > > > >> > this feature, we plan to add a new URI format for the field
> > > > >> > *segment.realtime.download.url
> > > > >> > *in LLCRealtimeSegmentZKMetadata.
> > > > >> >
> > > > >> > The new URI format serves the purpose of instructing a Pinot
> > server to
> > > > >> find
> > > > >> > and download the segment from a peer server. Controller writes it
> > to
> > > > >> Helix
> > > > >> > in case of segment upload failure or no deep store configured at
> > all.
> > > > We
> > > > >> > proposed the following format options and want to hear your
> > feedback:
> > > > >> >
> > > > >> >    1. peer:///segmentName; (my preference)
> > > > >> >    2. simply an empty string *''*
> > > > >> >
> > > > >> > Both are in essence specially markers to indicate that the
> > segment is
> > > > >> not
> > > > >> > found in deep store and servers have to download them from peer
> > > > servers.
> > > > >> > (1) has the benefit of better readability than (2) for debugging
> > > > >> purposes.
> > > > >> >
> > > > >> > Please let me know what you think.
> > > > >> >
> > > > >> > Ting Chen
> > > > >> >
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> > > > >> For additional commands, e-mail: dev-h...@pinot.apache.org
> > > > >>
> > > > >>
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> > For additional commands, e-mail: dev-h...@pinot.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org

Reply via email to