Re: [VOTE] Graduate Apache Pinot as TLP

2021-06-24 Thread TING CHEN
+1

On Thu, Jun 24, 2021 at 12:44 PM siddharth teotia 
wrote:

> + 1
>
> On Thu, Jun 24, 2021 at 12:13 AM Kevin Ratnasekera <
> djkevincr1...@gmail.com> wrote:
>
>> +1  ( binding )
>>
>> On Thu, Jun 24, 2021 at 12:32 PM Furkan KAMACI 
>> wrote:
>>
>>> Hi,
>>>
>>> +1 (binding).
>>>
>>> Good luck!
>>>
>>> Kind Regards,
>>> Furkan KAMACI
>>>
>>> On Thu, Jun 24, 2021 at 9:06 AM vino yang  wrote:
>>>
>>> > +1
>>> >
>>> > Abhishek Tiwari  于2021年6月24日周四 下午1:44写道:
>>> >
>>> > > +1 (non-binding)
>>> > >
>>> > > On Wed, Jun 23, 2021 at 10:34 PM Xiang Fu 
>>> > wrote:
>>> > >
>>> > > > +1
>>> > > >
>>> > > > Xiang Fu
>>> > > >
>>> > > >
>>> > > > > On Jun 23, 2021, at 9:21 PM, Atri Sharma 
>>> wrote:
>>> > > > >
>>> > > > > +1(binding)
>>> > > > >
>>> > > > > On Wed, Jun 23, 2021 at 1:14 AM Mayank Shrivastava <
>>> > maya...@apache.org
>>> > > >
>>> > > > wrote:
>>> > > > >>
>>> > > > >> Dear Incubator Community,
>>> > > > >>
>>> > > > >> We have discussed Apache Pinot Podling graduation in the
>>> > > > general@incubator
>>> > > > >> DISCUSS thread [1], and addressed all the questions and concerns
>>> > > > brought up
>>> > > > >> in the thread. Please refer to [1] for details on the questions
>>> and
>>> > > > >> concerns brought up, as well as their resolutions. With no
>>> > objections
>>> > > > >> brought up in the discussion, we would like to proceed with the
>>> > voting
>>> > > > >> process.
>>> > > > >>
>>> > > > >> Here is the official vote for graduating Apache Pinot project as
>>> > TLP.
>>> > > > >>
>>> > > > >> Please provide your in the following options:
>>> > > > >>
>>> > > > >> [ ] +1 - Recommend graduation of Apache Pinot as a TLP
>>> > > > >>
>>> > > > >> [ ]  0 - I don't feel strongly about it, but don't object
>>> > > > >>
>>> > > > >> [ ] -1 - Do not recommend the graduation of Apache Pinot
>>> because…
>>> > > > >>
>>> > > > >> The VOTE will remain open for at least 72 hours.
>>> > > > >>
>>> > > > >> To summarize a few of the community's achievements:
>>> > > > >>
>>> > > > >>
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   7800+ contributions from 168 contributors
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   7 releases by various release managers
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   6 new committers and 2 new PPMCs invited (all accepted)
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   Diverse committers and PPMCs (from 7 companies/institutes)
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   Apache website setup [4]
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   Dev conversations at dev@pinot.apache.org
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   Assessed ourselves against the Apache Project maturity matrix
>>> [5]
>>> > > > >>   -
>>> > > > >>
>>> > > > >>   We have built a meritocratic and open collaborative process
>>> (the
>>> > > > Apache
>>> > > > >>   way)
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >> =
>>> > > > >>
>>> > > > >> Establish the Apache Pinot Project
>>> > > > >>
>>> > > > >> WHEREAS, the Board of Directors deems it to be in the best
>>> interests
>>> > > of
>>> > > > the
>>> > > > >> Foundation and consistent with the Foundation's purpose to
>>> > establish a
>>> > > > >> Project Management Committee charged with the creation and
>>> > maintenance
>>> > > > of
>>> > > > >> open-source software, for distribution at no charge to the
>>> public,
>>> > > > related
>>> > > > >> to a distributed data integration framework that simplifies
>>> common
>>> > > > aspects
>>> > > > >> of big data integration such as data ingestion, replication,
>>> > > > organization
>>> > > > >> and lifecycle management for both streaming and batch data
>>> > ecosystems.
>>> > > > >>
>>> > > > >> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>>> Committee
>>> > > > (PMC),
>>> > > > >> to be known as the "Apache Pinot Project", be and hereby is
>>> > > established
>>> > > > >> pursuant to Bylaws of the Foundation; and be it further
>>> > > > >>
>>> > > > >> RESOLVED, that the Apache Pinot Project be and hereby is
>>> responsible
>>> > > for
>>> > > > >> the creation and maintenance of software related to distributed
>>> OLAP
>>> > > > data
>>> > > > >> store to provide Real-time Analytics to power wide variety of
>>> > > analytical
>>> > > > >> use case; and be it further
>>> > > > >>
>>> > > > >> RESOLVED, that the office of "Vice President, Apache Pinot" be
>>> and
>>> > > > hereby
>>> > > > >> is created, the person holding such office to serve at the
>>> direction
>>> > > of
>>> > > > the
>>> > > > >> Board of Directors as the chair of the Apache Pinot Project,
>>> and to
>>> > > have
>>> > > > >> primary responsibility for management of the projects within the
>>> > scope
>>> > > > of
>>> > > > >> responsibility of the Apache Pinot Project; and be it further
>>> > > > >>
>>> > > > >> RESOLVED, that the persons listed immediately below be and
>>> hereby
>>> > are
>>> > > > >> appointed to serve as the initial members of the Apache 

Re: [VOTE] Apache Pinot graduation to a TLP

2021-06-04 Thread TING CHEN
+1

On Fri, Jun 4, 2021 at 10:25 AM siddharth teotia 
wrote:

> +1
>
> On Fri, Jun 4, 2021 at 10:21 AM Yupeng Fu  wrote:
>
>> +1
>>
>> On Fri, Jun 4, 2021 at 10:20 AM Seunghyun Lee  wrote:
>>
>>> +1
>>>
>>> Best,
>>> Seunghyun
>>>
>>> On Fri, Jun 4, 2021 at 10:14 AM Mayank Shrivastava 
>>> wrote:
>>>
 +1

 Adding permalink of the dev discussion:
 https://lists.apache.org/thread.html/r6068cae91a474e86595cc02d90701501d22c08274216301facf935cc%40%3Cdev.pinot.apache.org%3E
 And carrying over the votes from the dev discussion:

 Mentors:
 +1 Felix Cheug
 +1 Kishore G
 +1 Olivier Lamy
 +1 Jim Jagielsky

 PPMC:
 +1 Mayank Shrivastava
 +1 Seunghyun Lee
 +1 Xiang Fu
 +1 Subbu Subramaniam

 Committers:
 +1 Yupeng Fu

 On Fri, Jun 4, 2021 at 10:00 AM Subbu Subramaniam 
 wrote:

> +1
>
> -Subbu
>
> On 2021/06/04 16:55:14, Mayank Shrivastava 
> wrote:
> > Hello all,
> >
> > As per our discussion on the dev mailing list
> > , I would
> like to
> > call a VOTE for Apache Pinot graduating as a top level Apache
> project.
> >
> > If this vote passes, the next step would be to submit the resolution
> below
> >
> > to the Incubator PMC, who would vote on sending it on to the Apache
> Board.
> >
> > Vote:
> >
> > [ ] +1 - Recommend graduation of Apache Pinot as a TLP
> >
> > [ ] -1 - Do not recommend the graduation of Apache Pinot because...
> >
> > The VOTE is open for a minimum of 72 hours.
> >
> > Establish the Apache Pinot Project
> >
> > WHEREAS, the Board of Directors deems it to be in the best interests
> of the
> > Foundation and consistent with the Foundation's purpose to establish
> a
> > Project Management Committee charged with the creation and
> maintenance of
> > open-source software, for distribution at no charge to the public,
> related
> > to a distributed data integration framework that simplifies common
> aspects
> > of big data integration such as data ingestion, replication,
> organization
> > and lifecycle management for both streaming and batch data
> ecosystems.
> >
> > NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> (PMC),
> > to be known as the "Apache Pinot Project", be and hereby is
> established
> > pursuant to Bylaws of the Foundation; and be it further
> >
> > RESOLVED, that the Apache Pinot Project be and hereby is responsible
> for
> > the creation and maintenance of software related to distributed OLAP
> data
> > store to provide Real-time Analytics to power wide variety of
> analytical
> > use case; and be it further
> >
> > RESOLVED, that the office of "Vice President, Apache Pinot" be and
> hereby
> > is created, the person holding such office to serve at the direction
> of the
> > Board of Directors as the chair of the Apache Pinot Project, and to
> have
> > primary responsibility for management of the projects within the
> scope of
> > responsibility of the Apache Pinot Project; and be it further
> >
> > RESOLVED, that the persons listed immediately below be and hereby are
> > appointed to serve as the initial members of the Apache Pinot
> Project:
> >
> >-
> >
> >Felix Cheung 
> >-
> >
> >Jackie Jiang 
> >-
> >
> >Jim Jagielski 
> >-
> >
> >Kishore G 
> >-
> >
> >Mayank Shrivastava 
> >-
> >
> >Neha Pawar 
> >-
> >
> >Olivier Lamy 
> >-
> >
> >Seunghyun Lee 
> >-
> >
> >Siddharth Teotia 
> >-
> >
> >Subbu Subramaniam 
> >-
> >
> >Xiang Fu 
> >
> >
> >
> >
> > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Kishore Gopalakrishna be
> > appointed to the office of Vice President, Apache Pinot, to serve in
> > accordance with and subject to the direction of the Board of
> Directors and
> > the Bylaws of the Foundation until death, resignation, retirement,
> removal
> > of disqualification, or until a successor is appointed; and be it
> further
> >
> > RESOLVED, that the Apache Pinot Project be and hereby is tasked with
> the
> > migration and rationalization of the Apache Incubator Pinot podling;
> and be
> > it further
> >
> > RESOLVED, that all responsibilities pertaining to the Apache
> Incubator
> > Pinot podling encumbered upon the Apache Incubator PMC are hereafter
> > discharged.
> >
>
> -
> To unsubscribe, e-mail: 

Re: Removing PQL endpoint

2021-02-19 Thread TING CHEN
Uber uses both PQL and Presto to query Pinot. For Presto, the queries are
also translated to PQL for now. -Ting


On Fri, Feb 19, 2021 at 3:51 PM Mayank Shrivastava
 wrote:

> I remember checking with Uber team a while back (forget if Haibo or
> Yupeng), I was told that Uber uses Presto to query Pinot, so it should not
> matter?
>
> Regards,
> Mayank
> --
> *From:* Yupeng Fu 
> *Sent:* Friday, February 19, 2021 3:33 PM
> *To:* dev@pinot.apache.org 
> *Cc:* Ujwala Tulshigiri ; Girish Baliga ;
> Yupeng Fu 
> *Subject:* Re: Removing PQL endpoint
>
> +1 to what Ting suggested. The Presto to SQL migration from Uber side
> still needs a few more months.
>
> Alternatively, could we have a config to disable (and deprecate) the
> endpoint first, with the default value disabled? So the endpoint removal
> can be done together with the PQL cleanup.
>
> Thanks,
>
> On Fri, Feb 19, 2021 at 3:13 PM TING CHEN 
> wrote:
>
> Hi Sidd,
> Uber still uses PQL extensively with a few hundred tables and dozens
> of use cases. It takes us time to move out of PQL. End of Feb is too tight
> for us to complete the migration process. Can you postpone the removal of
> the query endpoints at least to the end of June so that we can complete the
> migration?
>
> Thanks,
> Ting
>
> On Thu, Feb 18, 2021 at 4:06 PM Siddharth Teotia
>  wrote:
>
> Hi All,
>
> It's been a while since Pinot has moved to SQL compliant syntax and
> semantics. Calcite SQL compiler has allowed us to move to standard SQL
> syntax and we will continue to leverage it for parsing, compiling and
> optimizing queries as more complex query functionality is added.
>
> However, with legacy PQL code existing, we need to put double effort when
> adding new query functionality to ensure it works for both PQL and SQL. It
> hurts dev productivity. Since SQL is the path forward, we need to start
> removing PQL from Pinot codebase.
>
> Please see this issue created in August last year
> https://github.com/apache/incubator-pinot/issues/5807
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-pinot%2Fissues%2F5807=04%7C01%7Cmshrivas%40linkedin.com%7C51f2c6f692984884b0a508d8d52edb7d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637493744639315649%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=IfTQ%2FLvlzQfKOmNDHrVgRTHDH6cE2Bwb3YHYLd7g2a0%3D=0>
>  proposing
> deprecation of PQL.
>
> As a first step, we would be removing the PQL query endpoint on broker
> (/query) and controller (/pql) by end of Feb. This will ensure that users
> can't use PQL to query Pinot. The follow-up cleanup of PQL from the engine
> (parser, execution engine) will be a subsequent task.
>
> Please let us know if you have any questions.
>
> Thanks
> Sidd
>
>
>
> --
> --Yupeng
>


Re: Removing PQL endpoint

2021-02-19 Thread TING CHEN
Hi Sidd,
Uber still uses PQL extensively with a few hundred tables and dozens of
use cases. It takes us time to move out of PQL. End of Feb is too tight for
us to complete the migration process. Can you postpone the removal of the
query endpoints at least to the end of June so that we can complete the
migration?

Thanks,
Ting

On Thu, Feb 18, 2021 at 4:06 PM Siddharth Teotia
 wrote:

> Hi All,
>
> It's been a while since Pinot has moved to SQL compliant syntax and
> semantics. Calcite SQL compiler has allowed us to move to standard SQL
> syntax and we will continue to leverage it for parsing, compiling and
> optimizing queries as more complex query functionality is added.
>
> However, with legacy PQL code existing, we need to put double effort when
> adding new query functionality to ensure it works for both PQL and SQL. It
> hurts dev productivity. Since SQL is the path forward, we need to start
> removing PQL from Pinot codebase.
>
> Please see this issue created in August last year
> https://github.com/apache/incubator-pinot/issues/5807 proposing
> deprecation of PQL.
>
> As a first step, we would be removing the PQL query endpoint on broker
> (/query) and controller (/pql) by end of Feb. This will ensure that users
> can't use PQL to query Pinot. The follow-up cleanup of PQL from the engine
> (parser, execution engine) will be a subsequent task.
>
> Please let us know if you have any questions.
>
> Thanks
> Sidd
>


[ANNOUNCE] Apache Pinot (incubating) 0.5.0 released

2020-09-10 Thread Ting Chen
Hello community,

We are pleased to announce that Apache Pinot (incubating) 0.5.0 is released!

Apache Pinot (incubating) is a distributed columnar storage engine that can
ingest data in realtime and serve analytical queries at low latency.

The release can be downloaded at:
https://downloads.apache.org/incubator/pinot/apache-pinot-incubating-0.5.0/

The release note is available at:
https://docs.pinot.apache.org/releases/0.5.0


Additional resources -
Project website: https://pinot.apache.org
Getting started: https://docs.pinot.apache.org/getting-started
Mailing list: dev@pinot.apache.org
Slack channel: https://communityinviter.com/apps/apache-pinot/apache-pinot
Twitter: https://twitter.com/ApachePinot

Best Regards,

Apache Pinot (incubating) Team


[RESULT][VOTE] Apache Pinot (incubating) 0.5.0 RC2

2020-09-04 Thread Ting Chen
Thanks to everyone for validating a release candidate. This vote is now
closed.

Apache Pinot (incubating) 0.5.0 RC2 has passed with 3 +1 (binding) votes and
no 0 or -1 votes.

+1 Kartik Khare
+1 Kishore Gopalakrishna
+1 Jim Jagielski

Best,
Ting Chen


[VOTE] Apache Pinot (incubating) 0.5.0 RC2

2020-09-02 Thread Ting Chen
Hi Pinot Community,

This is a call for a vote to release Apache Pinot (incubating) version
0.5.0.

The release candidate:
https://dist.apache.org/repos/dist/dev/incubator/pinot/apache-pinot-incubating-0.5.0-rc2

Diff with apache-pinot-incubating-0.5.0-rc1
https://github.com/apache/incubator-pinot/commit/d902c1a (Remove the
node_modules codes from the src package)

Git tag for this release:
https://github.com/apache/incubator-pinot/tree/release-0.5.0-rc2

Git hash for this release:
d87bbc9032c6efe626eb5f9ef1db4de7aa067179

The artifacts have been signed with a key: C650A5210408F8F4, which can be
found in the following KEYS file.
https://dist.apache.org/repos/dist/release/incubator/pinot/KEYS

Release notes:
https://github.com/apache/incubator-pinot/releases/tag/release-0.5.0-rc2

Staging repository:
https://repository.apache.org/content/repositories/orgapachepinot-1016

Documentation on verifying a release candidate:
https://cwiki.apache.org/confluence/display/PINOT/Validating+a+release+candidate

*Special notes for verification*:
During this step "diff -r apache-pinot-incubating-${VERSION}-src
pinot-git-src", due to Cluster Manager UI & Query Console UI revamp
(#5684), you might see the extra lines below. It is fine because they are
related to the revamped controller UI.

Only in
apache-pinot-incubating-0.5.0-src/pinot-controller/src/main/resources: dist
Only in
apache-pinot-incubating-0.5.0-src/pinot-controller/src/main/resources:
package-lock.json


The vote will be open for at least 72 hours or until a necessary number of
votes are reached.

Please vote accordingly,

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove with the reason

Thanks,
Apache Pinot (incubating) team


[RESULT][VOTE] Apache Pinot (incubating) 0.5.0 RC1

2020-08-31 Thread Ting Chen
Thanks to everyone for validating the release candidate. This vote is now
closed.

Apache Pinot (incubating) 0.5.0 RC1 has passed with 4 +1 (binding) votes and
no 0 or -1 votes.

Binding
+1 Kishore Gopalakrishna
+1 Haibo Wang
+1 Mayank Shrivastava
+1 Xiaotian Jiang

Best,
Ting Chen


[VOTE] Apache Pinot (incubating) 0.5.0 RC1

2020-08-28 Thread Ting Chen
Hi Pinot Community,

This is a call for a vote to release Apache Pinot (incubating) version
0.5.0.

The release candidate:
https://dist.apache.org/repos/dist/dev/incubator/pinot/apache-pinot-incubating-0.5.0-rc1

Git tag for this release:
https://github.com/apache/incubator-pinot/tree/release-0.5.0-rc1

Git hash for this release:
3c40ea207f84a36aa60bc2ff9987431d2d746222

The artifacts have been signed with a key: C650A5210408F8F4, which can be
found in the following KEYS file.
https://dist.apache.org/repos/dist/release/incubator/pinot/KEYS

Release notes:
https://github.com/apache/incubator-pinot/releases/tag/release-0.5.0-rc1

Staging repository:
https://repository.apache.org/content/repositories/orgapachepinot-1015

Documentation on verifying a release candidate:
https://cwiki.apache.org/confluence/display/PINOT/Validating+a+release+candidate

*Special notes for verification*: During this step "diff -r
apache-pinot-incubating-${VERSION}-src pinot-git-src, due to Cluster
Manager UI & Query Console UI revamp (#5684), you might see the extra lines
below. It is fine because they are generated code for the revamped
controller UI.

Only in
apache-pinot-incubating-0.5.0-src/pinot-controller/src/main/resources: dist

Only in
apache-pinot-incubating-0.5.0-src/pinot-controller/src/main/resources:
node_modules

Only in
apache-pinot-incubating-0.5.0-src/pinot-controller/src/main/resources:
package-lock.json



The vote will be open for at least 72 hours or until a necessary number of
votes are reached.

Please vote accordingly,

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove with the reason

Thanks,
Apache Pinot (incubating) team


Call for features you want to listed in Pinot 0.5 release

2020-07-31 Thread TING CHEN
  As I plan to release Pinot 0.5 in the coming 1 or 2 weeks, can you send
me any major feature you want to announce in the release note? Please reply
to this email thread with a short 1-3 sentence summary of the features
added.

Thanks,
Ting Chen


Re: new Metadata format for download url in LLCRealtimeSegmentZKMetadata

2020-06-16 Thread TING CHEN
Thanks Subbu. Let me summarize the discussion so far on this thread and
some questions/todos raised earlier:

1) Metadata format change for download url in *LLCRealtimeSegmentZKMetadata
*on top of today's deepstore uris.
*Options*:* Empty string *or* a new scheme peer:///segmentName*
*LLCRealtimeSegmentZKMetadata.getDownloadUrl()* is only used by
*RealtimeTableDataManager.downloadAndReplaceSegment().
*So all the url checking can be centralized here. The main pros of using
empty string is that we only have two main classes of url formats: valid
deepstore uri or empty string. This simplifies download logic at the
expense of debuggability (because we do not know if it is an error in
setting a download url or someone intentionally sets it empty.). The
pro/cons of using *peer:///segmentName* is exactly the opposite because we
are introducing 3 possible classes of url values: empty, peer and deepstore.
 *Decision*: Use *empty string* so that we have 2 classes of uri
formats and a unified and clear real time segment download strategy: i.e.,
download based on download url and if it fails use peer as back up.

   - Note that OfflineSegment also has download url in its metadata and its
   own download method. We should make it a TODO to unify them somehow in a
   follow-up design instead of the current discussion.

2) Peer segment download method.
A few issues raised in the thread but I feel the best place to discuss
this issue is in this filed PR
<https://github.com/apache/incubator-pinot/pull/5336> and the design doc.
Let me summarize the issue and put my opinions on them:

   - Use *SegmentFetcher* or *PinotFS:  *The two interface/abstract classes
   are very similar -- do we have a plan to merge them btw? To me the main
   functional requirement is the ability to load balance download across
   different servers. Implementation wise, we have a few options: (1) My PR
   right now uses a util class to pick a random server and let http(s) segment
   fetcher to download it; (2) Subbu proposed to let SegmentFetcher pick the
   random url by adding a new interface method. (3) SegmentFetcher has a
   subclass *PinotFSSegmentFetcher* through which we can abstract all
   download logic there via a new PeerPinotFS and more... My preference is
   toward a straightforward one like (1) or (2) for now. A full PinotFS on
   peer servers makes more sense when the offline segment support is also
   ready.



On Mon, Jun 15, 2020 at 8:48 AM Subbu Subramaniam 
wrote:

> No, I mean the follwing:
> (1) Segment download URI in metadata will have either a valid URI (from
> deepstore) OR be empty.
> (2) In case it is empty, we construct the URIs of all selected peers and
> pass it to the segment fetcher
> (3) We add a new method to the SegmentFetcher interface that takes a list
> of URIs instead of a single URI
> (4) We modify the retry logic in the base class to pick a random one from
> the list (even if the list size is 1).
> (5) default implementation for the list could be to take a random URI (or
> the first one, or whatever) from the list and call the existing method of
> one URI
>
> =Subbu
>
> On 2020/06/11 22:09:05, TING CHEN  wrote:
> > You mean multiple URIs in a segment's download url. No for this project.
> >
> > On Thu, Jun 11, 2020 at 2:59 PM kishore g  wrote:
> >
> > > +1 peer.
> > >
> > > unrelated to this - do we support multiple URI's?
> > >
> > > On Thu, Jun 11, 2020 at 2:51 PM Subbu Subramaniam  >
> > > wrote:
> > >
> > > > Hey Ting,
> > > >
> > > > I like the URI in metadata as "peer:///segmentName". This way, the
> URI
> > > > remains parsable and we can use the scheme to check for a segment
> > > fetcher,
> > > >
> > > > thanks
> > > >
> > > > -Subbu
> > > >
> > > > On 2020/06/10 01:09:25, TING CHEN  wrote:
> > > > > As part of the deep store by-passing
> > > > > <
> > > >
> > >
> https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion
> > > > >
> > > > > project, a server is allowed to download segments from peer Pinot
> > > servers
> > > > > during Low Level Consumer (LLC) instead of deep segment stores. To
> > > enable
> > > > > this feature, we plan to add a new URI format for the field
> > > > > *segment.realtime.download.url
> > > > > *in LLCRealtimeSegmentZKMetadata.
> > > > >
> > > > > The new URI format serves the purpose of instructing a Pinot
> server to
> > > > find
> > > > > and download the segment from a peer server

Re: [Vote] Enabling html for Pinot related mailing lists

2020-06-15 Thread TING CHEN
+1

On Mon, Jun 15, 2020 at 9:44 AM Subbu Subramaniam 
wrote:

> +1
>
> On 2020/06/14 03:01:03, Seunghyun Lee  wrote:
> > Hi all,
> >
> > While I was working on setting up the daily digest from slack channels, I
> > found that the html rendering feature is turned off by default for Pinot
> > mailing list.
> >
> > I tried to request to enable the feature from
> > https://issues.apache.org/jira/browse/INFRA-20423 and this needs the
> > project consensus.
> >
> > I would like to start the vote for "enabling html for Pinot mailing
> lists".
> >
> > Please vote accordingly:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove with the reason
> >
> > Thank you!
> > Seunghyun
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> For additional commands, e-mail: dev-h...@pinot.apache.org
>
>


Re: new Metadata format for download url in LLCRealtimeSegmentZKMetadata

2020-06-13 Thread TING CHEN
My understanding about Jackie's proposal is that in the step (3) below,
segment downloader just fetches from the segment download url no matter
what the value is. As long as the download fails (an empty uri certainly
triggers failure), the peer download path is used as the final fallback (if
peer download optional is enabled). I would say the two methods are similar
though. Adding an empty URI check will slightly improve the performance
while increasing the complexity of the logic (again only marginally).

I am also a bit concerned about how the readers of download urls will
behave with an empty URI. Let me do a code search to find out.


On Fri, Jun 12, 2020 at 10:03 AM Subbu Subramaniam 
wrote:

> OK, I think I see jackie's point. An empty URI simply means that we could
> not get it to store in deep store reliably. Architecturally, that makes
> sense. If a SegmentFetcher wants to fetch the segment and finds empty URI,
> then it needs to look at servers with ExrernalView being ONLINE for the
> segment and fetch it from there.
>
> So, let us do this.
> (1) The segment completion protoocol carries a URI as proposed by Ting.
> (We should not be able to distinguish between an invalid request that may
> be missing a URI vs a request that truly says that the URI is only with
> peers at this point in time).
> (2) Controller commits the metadata with an empty URI if it sees a "peer"
> scheme in the committing segment URI
> (3) Segment downloader assumes that an empty URI == download from peer.
>
> The con side of this is that we need to check everywhere to make sure that
> we are handling null URI in the metadata. It is hard to catch this in
> review, so i would ask Ting to do the due diligence here, and also have a
> couple of reviewers just in case.
>
> Regarding segment download, I have a proposal:
> A. Update the SegmentFetcher interface to introduce a new method that
> takes a list of URIs instead of one URI. Default impl for this method can
> be to call the older API with any of the URIs
> B. Modify the http segment fetcher to try a random URL if a list is given
> (or, cycle through, or whatever).
>
> This makes Ting;s code simpler on the caller side
>
> -Subbu
>
> On 2020/06/12 05:26:41, TING CHEN  wrote:
> > Thanks for your comments. The download url zk Metadata format change is
> > part of the deepstore by-pass proposal. The design aims to improve on
> > status quo for two cases w.r.t deep store config:
> >
> >1. A deep store configured for Pinot cluster
> >2. No deep store configured for Pinot -- not for heavy users of Pinot
> >but still valuable for many small/medium size setup as seen from slack
> >questions.
> >
> > For case 2, because there is no deep store URI, there is a need for a new
> > download URI format in zkMetaData.
> > For case 1, our design proposal of controller and server interaction in
> > case of deep store outage is similar to your proposal. In particular,
> >
> >1. When the selected commit server fails to upload a segment *S* to
> the
> >deep store, it will pass a special uri *U* to the controller.
> >2. The controller will then save *U *in the down url of a segment
> >metadata.
> >3. When another server needs to download *S, *if it sees *U *it
> >directly downloads from peer servers. Otherwise it first tries to
> download
> >from deep store and if fails trying to download from peer servers.
> >
> > Note that the above steps work for both case 1 and case 2.
> >
> > To me, your proposal simplifies the logic above in step 3 assuming deep
> > store is configured. I.e., everytime the server needs to download *S, *it
> > always downloads from deepstore first if it fails, downloads *S* from
> peer.
> > My proposal adds a check from the uri format but overall it works with
> both
> > cases regardless if the deep store is configured or not.
> >
> > As for segment download cost from peer servers, I agree it is expensive
> and
> > should be avoided for high traffic clusters. But it is not the main focus
> > of this discussion.
> >
> >
> > On Thu, Jun 11, 2020 at 4:18 PM Xiaotian Jiang 
> wrote:
> >
> > > Let me elaborate more on my proposal:
> > >
> > > The problem we are trying to solve here is when deep storage has lower
> > > availability than Pinot, we should be able to leverage the segment
> > > replications on Pinot server to increase the availability of segment
> > > download.
> > > Here we should first try to download from deep storage, and only if it
> > > fails, we use peer download as the backup.
> > > It is possible 

Re: new Metadata format for download url in LLCRealtimeSegmentZKMetadata

2020-06-11 Thread TING CHEN
Thanks for your comments. The download url zk Metadata format change is
part of the deepstore by-pass proposal. The design aims to improve on
status quo for two cases w.r.t deep store config:

   1. A deep store configured for Pinot cluster
   2. No deep store configured for Pinot -- not for heavy users of Pinot
   but still valuable for many small/medium size setup as seen from slack
   questions.

For case 2, because there is no deep store URI, there is a need for a new
download URI format in zkMetaData.
For case 1, our design proposal of controller and server interaction in
case of deep store outage is similar to your proposal. In particular,

   1. When the selected commit server fails to upload a segment *S* to the
   deep store, it will pass a special uri *U* to the controller.
   2. The controller will then save *U *in the down url of a segment
   metadata.
   3. When another server needs to download *S, *if it sees *U *it
   directly downloads from peer servers. Otherwise it first tries to download
   from deep store and if fails trying to download from peer servers.

Note that the above steps work for both case 1 and case 2.

To me, your proposal simplifies the logic above in step 3 assuming deep
store is configured. I.e., everytime the server needs to download *S, *it
always downloads from deepstore first if it fails, downloads *S* from peer.
My proposal adds a check from the uri format but overall it works with both
cases regardless if the deep store is configured or not.

As for segment download cost from peer servers, I agree it is expensive and
should be avoided for high traffic clusters. But it is not the main focus
of this discussion.


On Thu, Jun 11, 2020 at 4:18 PM Xiaotian Jiang  wrote:

> Let me elaborate more on my proposal:
>
> The problem we are trying to solve here is when deep storage has lower
> availability than Pinot, we should be able to leverage the segment
> replications on Pinot server to increase the availability of segment
> download.
> Here we should first try to download from deep storage, and only if it
> fails, we use peer download as the backup.
> It is possible that the deep storage URL does not exist because the
> segment upload failed, in which case we should download from the
> peers.
> Notice that in both cases, peer download should be modeled as a backup
> plan instead of the main way of downloading segments.
> Also, downloading segments from a server requires the server to
> compress and send the segments, which is not a cheap operation, and
> can cause performance impact on the server. So we should only use peer
> download if there is no other option, i.e. as a backup.
> If we model peer download as a backup plan, we should not overload the
> existing downloadUri to trigger it. Instead, we should try to download
> with the downloadUri first, and only if it fails (including the case
> where downloadUri does not exist), we try to download from peer.
>
> On Thu, Jun 11, 2020 at 3:26 PM TING CHEN 
> wrote:
> >
> > Our current design
> > <
> https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion
> >
> > does not add a new pinotFS for *peer://*. This is a very interesting
> > question though. The only operations our design uses today for peer "FS"
> is
> > essentially *copyToLocal()* in the pinotFS interface. Our design
> basically
> > has a class and a few supporting methods
> > <
> https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion#BypassingdeepstorerequirementforRealtimesegmentcompletion-EnablebesteffortsegmentuploadinSplitSegmentCommiteranddownloadsegmentfrompeerservers
> .>
> > implementing the above method. That is why we do not add a full-fledged
> > pinotFS subclass for this design.
> >
> >
> > On Thu, Jun 11, 2020 at 3:00 PM kishore g  wrote:
> >
> > > Also, will peer:// have an implementation of pinotFS ?
> > >
> > > On Thu, Jun 11, 2020 at 2:58 PM kishore g  wrote:
> > >
> > > > +1 peer.
> > > >
> > > > unrelated to this - do we support multiple URI's?
> > > >
> > > > On Thu, Jun 11, 2020 at 2:51 PM Subbu Subramaniam <
> mcvsu...@apache.org>
> > > > wrote:
> > > >
> > > >> Hey Ting,
> > > >>
> > > >> I like the URI in metadata as "peer:///segmentName". This way, the
> URI
> > > >> remains parsable and we can use the scheme to check for a segment
> > > fetcher,
> > > >>
> > > >> thanks
> > > >>
> > > >> -Subbu
> > > >>
> > > >> On 2020/06

Re: new Metadata format for download url in LLCRealtimeSegmentZKMetadata

2020-06-11 Thread TING CHEN
You mean multiple URIs in a segment's download url. No for this project.

On Thu, Jun 11, 2020 at 2:59 PM kishore g  wrote:

> +1 peer.
>
> unrelated to this - do we support multiple URI's?
>
> On Thu, Jun 11, 2020 at 2:51 PM Subbu Subramaniam 
> wrote:
>
> > Hey Ting,
> >
> > I like the URI in metadata as "peer:///segmentName". This way, the URI
> > remains parsable and we can use the scheme to check for a segment
> fetcher,
> >
> > thanks
> >
> > -Subbu
> >
> > On 2020/06/10 01:09:25, TING CHEN  wrote:
> > > As part of the deep store by-passing
> > > <
> >
> https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion
> > >
> > > project, a server is allowed to download segments from peer Pinot
> servers
> > > during Low Level Consumer (LLC) instead of deep segment stores. To
> enable
> > > this feature, we plan to add a new URI format for the field
> > > *segment.realtime.download.url
> > > *in LLCRealtimeSegmentZKMetadata.
> > >
> > > The new URI format serves the purpose of instructing a Pinot server to
> > find
> > > and download the segment from a peer server. Controller writes it to
> > Helix
> > > in case of segment upload failure or no deep store configured at all.
> We
> > > proposed the following format options and want to hear your feedback:
> > >
> > >1. peer:///segmentName; (my preference)
> > >2. simply an empty string *''*
> > >
> > > Both are in essence specially markers to indicate that the segment is
> not
> > > found in deep store and servers have to download them from peer
> servers.
> > > (1) has the benefit of better readability than (2) for debugging
> > purposes.
> > >
> > > Please let me know what you think.
> > >
> > > Ting Chen
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> > For additional commands, e-mail: dev-h...@pinot.apache.org
> >
> >
>


new Metadata format for download url in LLCRealtimeSegmentZKMetadata

2020-06-09 Thread TING CHEN
As part of the deep store by-passing
<https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion>
project, a server is allowed to download segments from peer Pinot servers
during Low Level Consumer (LLC) instead of deep segment stores. To enable
this feature, we plan to add a new URI format for the field
*segment.realtime.download.url
*in LLCRealtimeSegmentZKMetadata.

The new URI format serves the purpose of instructing a Pinot server to find
and download the segment from a peer server. Controller writes it to Helix
in case of segment upload failure or no deep store configured at all. We
proposed the following format options and want to hear your feedback:

   1. peer:///segmentName; (my preference)
   2. simply an empty string *''*

Both are in essence specially markers to indicate that the segment is not
found in deep store and servers have to download them from peer servers.
(1) has the benefit of better readability than (2) for debugging purposes.

Please let me know what you think.

Ting Chen


Re: Adding a new field in SegmentsValidationAndRetentionConfig for peer segment download

2020-05-29 Thread TING CHEN
Thanks for the feedback and discussions from Subbu, Seunghyun and Jackie
for the new table configuration field to enable peer segment download. In
the end, we decided to

   -  Add a new optional string field *peerSegmentDownloadScheme* to
   the SegmentsValidationAndRetentionConfig in the TableConfig. The value can
   be *http* or *https*.

The field will enable download of segments for both realtime and offline
table segments from peer servers. In the beginning, only realtime table
segments download will be supported. The design details can be found in this
section of the cwiki doc
<https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion#By-passingdeep-storerequirementforRealtimesegmentcompletion-EnablebesteffortsegmentuploadinSplitSegmentCommiteranddownloadsegmentfrompeerservers.>.
I will send a PR soon to add this table config.

Thanks,
Ting

On Tue, May 5, 2020 at 5:16 PM TING CHEN  wrote:

>
> As part of the proposal
> <https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion>
> to bypass deep store for segment completion, I plan to add a new optional
> string field *peerSegmentDownloadScheme* to
> the SegmentsValidationAndRetentionConfig in the TableConfig. The value can
> be *http* or *https*.
>
>1. SplitSegmentCommitter
>
> <https://github.com/apache/incubator-pinot/blob/31c55afdb6a40f98189308ce6292587ead9d0dec/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SplitSegmentCommitter.java>
>will check this value. If it exists, the segment committer will be able to
>finish segment commit successfully even if the upload to the segment store
>fails. The committer will report a special marker to the controller about
>the segment is available in peer servers.
>2. When Pinot servers fail to download segments from the segment
>store, they can also check this field's value. If it exists, it can
>download segments from peer servers using either HTTP and HTTPS segment
>fetchers as configured. (related PR
><https://github.com/apache/incubator-pinot/pull/5336> in review for
>how to discover such servers.)
>
> Note this is a table level config. We will test the new download behavior
> in realtime tables in incremental fashion. Once fully proven, this config
> can be upgraded to server level config.
>
> Please let me know if you have any questions on this. Thanks @mcvsubbu for
> coming up with the idea and offline discussions.
>
> Ting Chen
>
>


Re: Adding a new field in SegmentsValidationAndRetentionConfig for peer segment download

2020-05-18 Thread TING CHEN
Thanks Xiaotian. My comments are inline. Some of these questions below seem
to be related to overall Deep Store by-pass design rather than this table
config change. I would provide brief answers below while directing most of
them to the cwiki design discussion.

On Mon, May 18, 2020 at 5:39 PM Xiaotian Jiang  wrote:

> Seems we always use HTTP for inner cluster communication (e.g.
> TableSizeReader).
> We should make peer download the default behavior. If the segment
> download from deep storage failed, we should always try with the peer.
> For the race condition issue, we can add CRC into the peer download
> request (CRC can be obtained from the ZK metadata).
>
> The race condition pointed out by @mcvsubbu  is
mainly for offline segment cases -- we will not address it in this design
but leave it as a TODO. When you say "peer download the default behavior"
as the default, I suppose you mean after (1) we test this feature for
realtime tables and (2) coming up with a design for offline tables and then
test on offline tables too. If this is the case, this table config change
is the right step toward that direction.

The following questions are more related to overall design:

There are some parts not clear to me:
> - Can we always obtain the deep storage address before successfully
> uploading it?
> - Should we try to at least upload once before committing the segment?
>
With a new time-bounded and best effort segment uploader added recently (PR
5314 <https://github.com/apache/incubator-pinot/pull/5314>), the server
will wait for a configurable amount of time for the segment upload to
succeed before committing the segment. If the upload succeeds (within the
timeout period), a deep storage address will be returned.

- If we put downloadUrl before successfully uploading the segment, how
> to detect whether the upload is successful or not?
>
Similar to my answer to the previous question, with PR 5314
<https://github.com/apache/incubator-pinot/pull/5314> we will upload the
segment before commit. I would update the cwiki about this change. The main
factor driving this change is to utilize the SplitCommitter framework.


> - When we find segment not uploaded to the deep storage, who is
> responsible of uploading it, who is responsible of updating the
> downloadUrl (controller or server)?
>
There would be follow up PR
<https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion?focusedCommentId=152115613#By-passingdeep-storerequirementforRealtimesegmentcompletion-RealtimeValidationManagerchanges>
on RealtimeValidation manager to check and let servers to upload the
segment to the deep store. The controller will update the zk data.



>
> Jackie
>
> On Mon, May 18, 2020 at 4:13 PM TING CHEN  wrote:
> >
> >
> >
> > -- Forwarded message -
> > From: TING CHEN 
> > Date: Tue, May 5, 2020 at 5:16 PM
> > Subject: Adding a new field in SegmentsValidationAndRetentionConfig for
> peer segment download
> > To: 
> >
> >
> >
> > As part of the proposal to bypass deep store for segment completion, I
> plan to add a new optional string field peerSegmentDownloadScheme to the
> SegmentsValidationAndRetentionConfig in the TableConfig. The value can be
> http or https.
> >
> > SplitSegmentCommitter  will check this value. If it exists, the segment
> committer will be able to finish segment commit successfully even if the
> upload to the segment store fails. The committer will report a special
> marker to the controller about the segment is available in peer servers.
> > When Pinot servers fail to download segments from the segment store,
> they can also check this field's value. If it exists, it can download
> segments from peer servers using either HTTP and HTTPS segment fetchers as
> configured. (related PR in review for how to discover such servers.)
> >
> > Note this is a table level config. We will test the new download
> behavior in realtime tables in incremental fashion. Once fully proven, this
> config can be upgraded to server level config.
> >
> > Please let me know if you have any questions on this. Thanks @mcvsubbu
> for coming up with the idea and offline discussions.
> >
> > Ting Chen
> >
>


Re: Adding a new field in SegmentsValidationAndRetentionConfig for peer segment download

2020-05-18 Thread TING CHEN
On Mon, May 18, 2020 at 5:24 PM Seunghyun Lee  wrote:

> If the final goal is to make this as the server level configuration (and
> remove the table level in the future), I recommend adding this config
> inside of `StreamConfig` because it will be much easier to remove the
> config from StreamConfig because it's a map.
>
> As Subbu mentioned, we may need to put it in the table config if this
> feature needs to work for both offline/realtime.
>
> @Ting Is this feature will only be used for realtime?
>

Initially yes, the peer downloading feature will only be used in real-time
ingestion. Nothing prevents offline segment loading to use peer downloading
though (with proper design overcoming a few cases raised by @mcvsubbu
) in future. So putting it in
SegmentsValidationAndRetentionConfig could be future proof -- deprecating a
config is not ideal but acceptable in this sense.

The current design does not address much on the offline segment story. But
overall, the benefits for offline tables are similar in the sense that
segments can still be downloadable when the deep store is unavailable for
some time period.



>
>
> On Mon, May 11, 2020 at 11:10 AM Subbu Subramaniam 
> wrote:
>
> > The goal of the config change in [arts other than StreamConfig is to
> > ensure that the config can be used for offline segments as well, if for
> > some reason download fails.
> >
> > However, a race condition can happen for offline segments that can be
> > dangerous. If a segment has been updated with a newer version, and
> server A
> > and B have old versions. Both of them get notified of the newer version.
> > They may try to fetch the segment and fail, and eventually fetch from
> each
> > other, and end up thinking that they have the newest version of the
> > segment.There can be other variants of this as well, with restarts of
> > server.
> >
> > I can think of some ways to fix this (e.g. in the segment update message,
> > send the crc of the new version), but these have not been fully thought
> of.
> > We need to vet these well before adopting these.
> >
> > I prefer option 1 since it introduces a single config.
> >
> > I would like to hear from @kishoreg and @npawar as well
> >
> > -Subbu
> >
> > On 2020/05/06 00:16:26, TING CHEN  wrote:
> > > As part of the proposal
> > > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_PINOT_By-2Dpassing-2Bdeep-2Dstore-2Brequirement-2Bfor-2BRealtime-2Bsegment-2Bcompletion=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=2RLFABXWvcW1-9iPo8za5jQ05gjroffzi6X6Fwv193s=4PDasxjke_lypTn76s0wILlODFSY0ptfoHSu0cyCCxg=
> > >
> > > to bypass deep store for segment completion, I plan to add a new
> optional
> > > string field *peerSegmentDownloadScheme* to
> > > the SegmentsValidationAndRetentionConfig in the TableConfig. The value
> > can
> > > be *http* or *https*.
> > >
> > >1. SplitSegmentCommitter
> > ><
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dpinot_blob_31c55afdb6a40f98189308ce6292587ead9d0dec_pinot-2Dcore_src_main_java_org_apache_pinot_core_data_manager_realtime_SplitSegmentCommitter.java=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=2RLFABXWvcW1-9iPo8za5jQ05gjroffzi6X6Fwv193s=PwR38isPLlQEuKjFAEM6Ww3w1LD8_Goe3VbWiNSlxhc=
> > >
> > >will check this value. If it exists, the segment committer will be
> > able to
> > >finish segment commit successfully even if the upload to the segment
> > store
> > >fails. The committer will report a special marker to the controller
> > about
> > >the segment is available in peer servers.
> > >2. When Pinot servers fail to download segments from the segment
> > store,
> > >they can also check this field's value. If it exists, it can
> download
> > >segments from peer servers using either HTTP and HTTPS segment
> > fetchers as
> > >configured. (related PR
> > ><
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dpinot_pull_5336=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=2RLFABXWvcW1-9iPo8za5jQ05gjroffzi6X6Fwv193s=vdR7nAjvQq715f8CQB2FRdCKW5vekmx5wF6D2moT2VE=
> > in review for
> > how
> > >to discover such servers.)
> > >
> > > Note this is a table level config. We will test the new download
> behavior
> > > in realtime tables in incremental fashion. Once fully proven, this
> config
> > > can be upgraded to server level config.
> > >
> > > Please let me know if you have any questions on this. Thanks @mcvsubbu
> > for
> > > coming up with the idea and offline discussions.
> > >
> > > Ting Chen
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> > For additional commands, e-mail: dev-h...@pinot.apache.org
> >
> >
>


Adding a new field in SegmentsValidationAndRetentionConfig for peer segment download

2020-05-05 Thread TING CHEN
As part of the proposal
<https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion>
to bypass deep store for segment completion, I plan to add a new optional
string field *peerSegmentDownloadScheme* to
the SegmentsValidationAndRetentionConfig in the TableConfig. The value can
be *http* or *https*.

   1. SplitSegmentCommitter
   
<https://github.com/apache/incubator-pinot/blob/31c55afdb6a40f98189308ce6292587ead9d0dec/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/SplitSegmentCommitter.java>
   will check this value. If it exists, the segment committer will be able to
   finish segment commit successfully even if the upload to the segment store
   fails. The committer will report a special marker to the controller about
   the segment is available in peer servers.
   2. When Pinot servers fail to download segments from the segment store,
   they can also check this field's value. If it exists, it can download
   segments from peer servers using either HTTP and HTTPS segment fetchers as
   configured. (related PR
   <https://github.com/apache/incubator-pinot/pull/5336> in review for how
   to discover such servers.)

Note this is a table level config. We will test the new download behavior
in realtime tables in incremental fashion. Once fully proven, this config
can be upgraded to server level config.

Please let me know if you have any questions on this. Thanks @mcvsubbu for
coming up with the idea and offline discussions.

Ting Chen


Re: Proposal to add a new server rest API for segment download

2020-04-08 Thread TING CHEN
A URI change based on feedback from Slack so far. The URI from download
will change to
/segments/{tableNameWithType}/{segmentName}
to keep it consistent with the controller API.

On Tue, Apr 7, 2020 at 3:32 PM TING CHEN  wrote:

> As a part of the PR <https://github.com/apache/incubator-pinot/pull/4914> to
> by-pass deep-store requirement for segment completion in low level
> realtime consumer (design doc
> <https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion>),
> I propose to add a new segment download rest API to Pinot server in the
> TablesResource
> <https://github.com/apache/incubator-pinot/blob/master/pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java>
>  class:
>
>- /tables/{tableNameWithType}/segments/{segmentName}
>
> The API allows download of a segment as a zipped tar file. Its primary
> usage is for segment download when a deep store is unavailable.
>
> Please let me know if you have any questions.
>
> Thanks,
> Ting Chen
>


Proposal to add a new server rest API for segment download

2020-04-07 Thread TING CHEN
As a part of the PR <https://github.com/apache/incubator-pinot/pull/4914> to
by-pass deep-store requirement for segment completion in low level
realtime consumer (design doc
<https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion>),
I propose to add a new segment download rest API to Pinot server in the
TablesResource
<https://github.com/apache/incubator-pinot/blob/master/pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java>
 class:

   - /tables/{tableNameWithType}/segments/{segmentName}

The API allows download of a segment as a zipped tar file. Its primary
usage is for segment download when a deep store is unavailable.

Please let me know if you have any questions.

Thanks,
Ting Chen


Does Pinot support early termination?

2020-02-18 Thread TING CHEN
Does Pinot do early stop when enough results have been already collected?

We have queries of form
"SELECT * FROM table WHERE userID='H' AND sourceEventTimestamp>=t1 AND
sourceEventTimestamp<=t2 ORDER BY sourceEventTimestamp DESC LIMIT 500".

The table has been sorted by sourceEventTimestamp and userID has inverted
index. I notice that the selectivity of the query is low (meaning many rows
passing the condition). So the first 500 results should be collected
relatively quick. But the exec times are too long i.e., > 10s.

=== Slack conversation with Kishore and Xiang attached below===
Kishore G <https://app.slack.com/team/UDRJ7G85T>  12:32 PM
<https://apache-pinot.slack.com/archives/CDRCA57FC/p1582057958151100>
We can do early termination if there is no order by
12:33 <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582058018152300>
But with order by, there is nothing much we can do to terminate early...
12:34 <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582058040153100>
What is the problem you are trying to solve?
Ting Chen <https://app.slack.com/team/UG3BZ4ALQ>  12:59 PM
<https://apache-pinot.slack.com/archives/CDRCA57FC/p1582059583158200>
the main issue we have is query latency is too long (~15 s). For early
termination, since the table is physically sorted by the ORDER_BY column, I
suppose an ideal plan is to check the relevant segments (starting with the
segments with the largest value in the filtering range) and stop when
enough results have been collected?
Kishore G <https://app.slack.com/team/UDRJ7G85T>  1:02 PM
<https://apache-pinot.slack.com/archives/CDRCA57FC/p1582059747160100>
That’s possible, what is the time range in the query
Ting Chen <https://app.slack.com/team/UG3BZ4ALQ>  1:04 PM
<https://apache-pinot.slack.com/archives/CDRCA57FC/p1582059864161200>
from 7 days ago to a few second ago. Basically the past 7 days' data.
Kishore G <https://app.slack.com/team/UDRJ7G85T>  1:10 PM
<https://apache-pinot.slack.com/archives/CDRCA57FC/p1582060200165000>
It’s a good optimization to have. Worth starting a thread and discussing
further. For now, is it possible for the client to break it up into
multiple queries- one for each day?
Ting Chen <https://app.slack.com/team/UG3BZ4ALQ>  1:12 PM
<https://apache-pinot.slack.com/archives/CDRCA57FC/p1582060338167500>
I will file an issue for this and do some investigation on codes. Yes, you
idea is basically the walk-around for now. I ask the customers to look for
the past 1 day's data instead: they still got their results needed while
the latency is halved.
Kishore G <https://app.slack.com/team/UDRJ7G85T>  1:14 PM
<https://apache-pinot.slack.com/archives/CDRCA57FC/p1582060485168700>
Cool. What you want is doable with some optimization in the planning
phase..
Xiang Fu <https://app.slack.com/team/UGRJA9TEH>  1:46 PM
<https://apache-pinot.slack.com/archives/CDRCA57FC/p1582062381169400>
@Ting Chen <https://apache-pinot.slack.com/team/UG3BZ4ALQ>
1:46 <https://apache-pinot.slack.com/archives/CDRCA57FC/p158206240517>
one thing about this is that the query will hit many segments and merge the
results
1:47 <https://apache-pinot.slack.com/archives/CDRCA57FC/p1582062448170800>
so it’s hard to tell the global ordering to do early termination


Re: [VOTE] Apache Pinot (incubating) 0.2.0 RC0

2019-11-14 Thread TING CHEN
+1

Ran tests and all passed.
Ran quick demos

Left some suggestions on the release doc.

On Wed, Nov 13, 2019 at 1:54 PM kishore g  wrote:

> +1
>
> Verified signatures. Updated wiki on Prerequistes needed to verify
> signatures on Mac.
> Quickstart works good.
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A9000_query_=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=LyRSnNWe9rcnLcHOHQnv4Cb5f7yUZP1ZltCL6NA6Nzg=C52AV1QXNy4TPqxvJbB4XNfOfM6rFvGt9HPuwDTxbWs=
> is the wrong url. It has additional / in the
> end that breaks the css. Not a blocker but something to fix in the next
> release.
>
> thanks
>
> On Wed, Nov 13, 2019 at 1:49 PM Seunghyun Lee  wrote:
>
> > +1
> >
> > 1. Checked that the bundles contains "incubating"
> > 2. Verified the signature and hash
> > 3. Verified that the release source matches with the code from Github
> > 4. Checked that the source can be compiled (used RHEL 7 linux)
> > 5. Ran quick starter script to check the basic functionality
> >
> >
> > On Wed, Nov 13, 2019 at 1:25 PM Subbu Subramaniam 
> > wrote:
> >
> > > Hi Seunghyun,
> > >
> > > Thanks for your vote.
> > >
> > > With respect to your suggestions:
> > > 1. 164D961B is still your key :-) I had sent a follow-up vote request
> > with
> > > my correct key (B530034C).
> > > 2. I have also added the signature to the KEYS file now.
> > >
> > > thanks again
> > >
> > > -Subbu
> > >
> > > On Tue, Nov 12, 2019 at 1:52 PM Subbu Subramaniam 
> > > wrote:
> > >
> > > > Please disregard the earlier message. Here is the correct one
> > > >
> > > > Hi Pinot Community,
> > > >
> > > > This is a call for vote to the release Apache Pinot (incubating)
> > version
> > > > 0.2.0.
> > > >
> > > > The release candidate:
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_incubator_pinot_apache-2Dpinot-2Dincubating-2D0.2.0-2Drc0=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=LyRSnNWe9rcnLcHOHQnv4Cb5f7yUZP1ZltCL6NA6Nzg=TyIaSotAfsPib0FRUy3k1cK1CPzpH__vxWGQuCJlRPM=
> > > >
> > > > Git tag for this release:
> > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dpinot_tree_release-2D0.2.0-2Drc0=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=LyRSnNWe9rcnLcHOHQnv4Cb5f7yUZP1ZltCL6NA6Nzg=qUVJ-K4QI1QIPpEBRGXXnBK9cbbIN_2AWFE4pbo2qww=
> > > > Git hash for this release:f8e1980c4160ac7fd2686d9edefab9ac0a825c5b
> > > >
> > > > The artifacts have been signed with key: B530034C, which can be
> > > > found in the following KEYS file.
> > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_release_incubator_pinot_KEYS=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=LyRSnNWe9rcnLcHOHQnv4Cb5f7yUZP1ZltCL6NA6Nzg=Mf9VZ_-qyPp2UNOXmKCixyNx2Z9yycRiMLHo33MKyzo=
> > > >
> > > > Release notes:
> > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dpinot_releases_tag_release-2D0.2.0-2Drc0=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=LyRSnNWe9rcnLcHOHQnv4Cb5f7yUZP1ZltCL6NA6Nzg=2wvGvq8nO8XlFj9X1BkuVGh5WMl1xPHQCTaQO9fZuzQ=
> > > >
> > > > Staging repository:
> > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_orgapachepinot-2D1003=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=LyRSnNWe9rcnLcHOHQnv4Cb5f7yUZP1ZltCL6NA6Nzg=C7lyKTC8c9xpc6WPd_JJGvKveCwBRAuOBqXUZnEkBvc=
> > > >
> > > > Documentation on verifying a release candidate:
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_PINOT_Validating-2Ba-2Brelease-2Bcandidate=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=5HWU4j1yzO0Dd4u753euNLdN_hyuGd4SHEssllc4sAY=LyRSnNWe9rcnLcHOHQnv4Cb5f7yUZP1ZltCL6NA6Nzg=2cKd3Gio73Ga8bFlgv4O_n_ocPYVEh5I0Ur-MD3i1wo=
> > > >
> > > >
> > > > The vote will be open for at least 72 hours or until necessary number
> > of
> > > > votes are reached.
> > > >
> > > > Please vote accordingly,
> > > >
> > > > [ ] +1 approve
> > > > [ ] +0 no opinion
> > > > [ ] -1 disapprove with the reason
> > > >
> > > > Thanks,
> > > > Apache Pinot (incubating) team
> > > >
> > > >
> > >
> >
>