instead of modifying http segment fetcher can we introduce a new FS implementation? SegmentFetcher should purely work on an FS implementation.
On Fri, Jun 12, 2020 at 10:03 AM Subbu Subramaniam <[email protected]> wrote: > OK, I think I see jackie's point. An empty URI simply means that we could > not get it to store in deep store reliably. Architecturally, that makes > sense. If a SegmentFetcher wants to fetch the segment and finds empty URI, > then it needs to look at servers with ExrernalView being ONLINE for the > segment and fetch it from there. > > So, let us do this. > (1) The segment completion protoocol carries a URI as proposed by Ting. > (We should not be able to distinguish between an invalid request that may > be missing a URI vs a request that truly says that the URI is only with > peers at this point in time). > (2) Controller commits the metadata with an empty URI if it sees a "peer" > scheme in the committing segment URI > (3) Segment downloader assumes that an empty URI == download from peer. > > The con side of this is that we need to check everywhere to make sure that > we are handling null URI in the metadata. It is hard to catch this in > review, so i would ask Ting to do the due diligence here, and also have a > couple of reviewers just in case. > > Regarding segment download, I have a proposal: > A. Update the SegmentFetcher interface to introduce a new method that > takes a list of URIs instead of one URI. Default impl for this method can > be to call the older API with any of the URIs > B. Modify the http segment fetcher to try a random URL if a list is given > (or, cycle through, or whatever). > > This makes Ting;s code simpler on the caller side > > -Subbu > > On 2020/06/12 05:26:41, TING CHEN <[email protected]> wrote: > > Thanks for your comments. The download url zk Metadata format change is > > part of the deepstore by-pass proposal. The design aims to improve on > > status quo for two cases w.r.t deep store config: > > > > 1. A deep store configured for Pinot cluster > > 2. No deep store configured for Pinot -- not for heavy users of Pinot > > but still valuable for many small/medium size setup as seen from slack > > questions. > > > > For case 2, because there is no deep store URI, there is a need for a new > > download URI format in zkMetaData. > > For case 1, our design proposal of controller and server interaction in > > case of deep store outage is similar to your proposal. In particular, > > > > 1. When the selected commit server fails to upload a segment *S* to > the > > deep store, it will pass a special uri *U* to the controller. > > 2. The controller will then save *U *in the down url of a segment > > metadata. > > 3. When another server needs to download *S, *if it sees *U *it > > directly downloads from peer servers. Otherwise it first tries to > download > > from deep store and if fails trying to download from peer servers. > > > > Note that the above steps work for both case 1 and case 2. > > > > To me, your proposal simplifies the logic above in step 3 assuming deep > > store is configured. I.e., everytime the server needs to download *S, *it > > always downloads from deepstore first if it fails, downloads *S* from > peer. > > My proposal adds a check from the uri format but overall it works with > both > > cases regardless if the deep store is configured or not. > > > > As for segment download cost from peer servers, I agree it is expensive > and > > should be avoided for high traffic clusters. But it is not the main focus > > of this discussion. > > > > > > On Thu, Jun 11, 2020 at 4:18 PM Xiaotian Jiang <[email protected]> > wrote: > > > > > Let me elaborate more on my proposal: > > > > > > The problem we are trying to solve here is when deep storage has lower > > > availability than Pinot, we should be able to leverage the segment > > > replications on Pinot server to increase the availability of segment > > > download. > > > Here we should first try to download from deep storage, and only if it > > > fails, we use peer download as the backup. > > > It is possible that the deep storage URL does not exist because the > > > segment upload failed, in which case we should download from the > > > peers. > > > Notice that in both cases, peer download should be modeled as a backup > > > plan instead of the main way of downloading segments. > > > Also, downloading segments from a server requires the server to > > > compress and send the segments, which is not a cheap operation, and > > > can cause performance impact on the server. So we should only use peer > > > download if there is no other option, i.e. as a backup. > > > If we model peer download as a backup plan, we should not overload the > > > existing downloadUri to trigger it. Instead, we should try to download > > > with the downloadUri first, and only if it fails (including the case > > > where downloadUri does not exist), we try to download from peer. > > > > > > On Thu, Jun 11, 2020 at 3:26 PM TING CHEN <[email protected]> > > > wrote: > > > > > > > > Our current design > > > > < > > > > https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion > > > > > > > > does not add a new pinotFS for *peer://*. This is a very interesting > > > > question though. The only operations our design uses today for peer > "FS" > > > is > > > > essentially *copyToLocal()* in the pinotFS interface. Our design > > > basically > > > > has a class and a few supporting methods > > > > < > > > > https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion#BypassingdeepstorerequirementforRealtimesegmentcompletion-EnablebesteffortsegmentuploadinSplitSegmentCommiteranddownloadsegmentfrompeerservers > > > .> > > > > implementing the above method. That is why we do not add a > full-fledged > > > > pinotFS subclass for this design. > > > > > > > > > > > > On Thu, Jun 11, 2020 at 3:00 PM kishore g <[email protected]> > wrote: > > > > > > > > > Also, will peer:// have an implementation of pinotFS ? > > > > > > > > > > On Thu, Jun 11, 2020 at 2:58 PM kishore g <[email protected]> > wrote: > > > > > > > > > > > +1 peer. > > > > > > > > > > > > unrelated to this - do we support multiple URI's? > > > > > > > > > > > > On Thu, Jun 11, 2020 at 2:51 PM Subbu Subramaniam < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > >> Hey Ting, > > > > > >> > > > > > >> I like the URI in metadata as "peer:///segmentName". This way, > the > > > URI > > > > > >> remains parsable and we can use the scheme to check for a > segment > > > > > fetcher, > > > > > >> > > > > > >> thanks > > > > > >> > > > > > >> -Subbu > > > > > >> > > > > > >> On 2020/06/10 01:09:25, TING CHEN <[email protected]> > > > wrote: > > > > > >> > As part of the deep store by-passing > > > > > >> > < > > > > > >> > > > > > > > > > https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion > > > > > >> > > > > > > >> > project, a server is allowed to download segments from peer > Pinot > > > > > >> servers > > > > > >> > during Low Level Consumer (LLC) instead of deep segment > stores. To > > > > > >> enable > > > > > >> > this feature, we plan to add a new URI format for the field > > > > > >> > *segment.realtime.download.url > > > > > >> > *in LLCRealtimeSegmentZKMetadata. > > > > > >> > > > > > > >> > The new URI format serves the purpose of instructing a Pinot > > > server to > > > > > >> find > > > > > >> > and download the segment from a peer server. Controller > writes it > > > to > > > > > >> Helix > > > > > >> > in case of segment upload failure or no deep store configured > at > > > all. > > > > > We > > > > > >> > proposed the following format options and want to hear your > > > feedback: > > > > > >> > > > > > > >> > 1. peer:///segmentName; (my preference) > > > > > >> > 2. simply an empty string *''* > > > > > >> > > > > > > >> > Both are in essence specially markers to indicate that the > > > segment is > > > > > >> not > > > > > >> > found in deep store and servers have to download them from > peer > > > > > servers. > > > > > >> > (1) has the benefit of better readability than (2) for > debugging > > > > > >> purposes. > > > > > >> > > > > > > >> > Please let me know what you think. > > > > > >> > > > > > > >> > Ting Chen > > > > > >> > > > > > > >> > > > > > >> > > > --------------------------------------------------------------------- > > > > > >> To unsubscribe, e-mail: [email protected] > > > > > >> For additional commands, e-mail: [email protected] > > > > > >> > > > > > >> > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
