Re: DISCUSS Pinot Graduation

2021-06-01 Thread Olivier Lamy
Good job guys!
definitely +1 to open vote (be ready as this will generate some comments
and more paperwork :) )


On Wed, 2 Jun 2021 at 03:14, Mayank Shrivastava  wrote:

> Mentors,
> Gentle ping, are we good to open a vote for Pinot's graduation? We have
> addressed all the issues brought up by Felix.
>
> Regards,
> Mayank
>
> On Sat, May 22, 2021 at 9:44 PM Mayank Shrivastava 
> wrote:
>
>> Gentle ping to mentors to guide us with graduation discussion. We have
>> addressed the comments raised by Felix in the thread earlier. Also
>> requesting comments from Oliver and Jim.
>> Thanks for your help.
>>
>> Regards,
>> Mayank
>>
>> On Mon, May 17, 2021 at 5:38 PM Mayank Shrivastava 
>> wrote:
>>
>>> Thanks Felix, for your valuable inputs. We have updated the project page
>>> and also added the maturity model. Please let us know if there are other
>>> things we should address.
>>>
>>> 1. Project Page 
>>> 2. Maturity Model
>>> 
>>>
>>> On Sun, May 16, 2021 at 1:57 PM Felix Cheung 
>>> wrote:
>>>
 Agreed it is not a blocker per se, but I figure it will raise some
 questions if more board/podling reports are missed.

 Maybe worthwhile to setup forwarding from dev@ to slack.


 --
 *From:* kishore g 
 *Sent:* Saturday, May 15, 2021 6:53 PM
 *To:* dev@pinot.apache.org
 *Cc:* Mayank Shrivastava; j...@apache.org; ol...@apache.org
 *Subject:* Re: DISCUSS Pinot Graduation

 I agree with you on traffic on dev. We tried to move few discussions to
 dev@ but entire community is active on the slack channel and prefer
 discussion on GitHub issues. We push everything to dev@ so folks on ML
 have all the information.

 We have < 100 on @dev vs ~1300 on slack. Trying to push slack members
 to join dev did not work in the past and I doubt it will work now. We will
 continue to make sure all discussions on slack are sent as digest to dev.

 We have mentioned this in our reports last year and we did not hear any
 objections from the board.


 Thanks,
 Kishore G

 On Fri, May 14, 2021 at 8:38 PM Felix Cheung 
 wrote:

> Generally it looks good, I’ve checked clutch report, website checks
> etc, but a few reminders and areas to pay attention to:
>
> - dev@ traffic is very or almost zero? I realize the community is
> active on slack and summary is sent to dev@ everyday, but some
> traffic there will be good...
>
> - ... because people will miss stuff. The podling report reminder was
> sent there and Pinot just missed this month’s podling report. Let’s make
> sure the month is out promptly next month
>
> - pls make sure the incubation status is updated
> http://incubator.apache.org/projects/pinot - for instance either the
> committer list is not sorted or the last date is wrong (should not be
> 2020). The rest of the page can use updating too. Also many sections there
> have placeholder content, pls fill them in.
>
> - also as suggested, please take the maturity model, fill it in and
> share with dev@ anything identified - project maturity model (as a
> guide)
>
> https://community.apache.org/apache-way/apache-project-maturity-model.html
>
>
>
> On Fri, May 14, 2021 at 3:13 PM Mayank Shrivastava 
> wrote:
>
>> Mentors - Felix, Olivier, Jim,
>> Wondering what your thoughts on are for proposing Pinot's graduation.
>> We have addressed all the issues that have been brought up in the past. 
>> If
>> there are other steps to be taken, please let us know and we can also 
>> take
>> care of those as well. Looking forward to your suggestions and support.
>>
>> Regards,
>> Mayank
>>
>> On Mon, May 10, 2021 at 12:54 PM Fu Xiang 
>> wrote:
>>
>>> +1! Glad to see we've accomplished a lot and the community is pretty
>>> strong and healthy!
>>>
>>> On Mon, May 10, 2021 at 11:23 AM Subbu Subramaniam <
>>> mcvsu...@apache.org> wrote:
>>>
 +1

 Let us know how we can help with the graduation, and if there are
 any pending items to be resolved.

 -Subbu

 On 2021/05/09 14:07:45, kishore g  wrote:
 > Hello,
 >
 >
 > I would like to start a conversation about the readiness of
 Apache Pinot to
 > graduate. We have come a long way since we incubated in Apache,
 with:
 >
 >
 >-
 >
 >7800+ contributions from 168 contributors
 >-
 >
 >7 releases by various committers
 >-
 >
 >6 new committers invited (all accepted)
 >-
 >

Apache Pinot Daily Email Digest (2021-06-01)

2021-06-01 Thread Pinot Slack Email Digest
#general@hongtaozhang: @hongtaozhang has joined the channel@krishnalumia535: @krishnalumia535 has joined the channel@kaustabhganguly: @kaustabhganguly has joined the channel@kaustabhganguly: Hii everyone ! 
I'm Kaustabh from India@kaustabhganguly: I'm a fresh CS grad and just exploring things. I am new to streaming data, kafka and pinot. I want to merge batched data and  streaming data and use pinot on top of it. My solution is to use Kafka connect as it's an ideal solution for merging batched and streaming data into topics & partitions. So my pipeline is basically using kafka for merging and then using pinot for streaming from kafka. *Is there a better solution that comes across anyone's mind ? Please correct me if there's any fallacy in my logic.*  @mayanks: Since Pinot can ingest data from offline directly, you could simply have Pinot ingest from a separate offline pipeline as well as Kafka stream.   @kaustabhganguly: Thanks. Trying that out. Will ask here if I have some doubts.  @mayanks: Yes, feel free to ask any questions here  @mayanks:   @mayanks: Also see if you can find your answer ^^. If not, perhaps we can improve the docs  @kaustabhganguly: Sure thing.@pedro.cls93: Hello,
How does Pinot decide if a field in an incoming message is null to apply the defaultNullValue?
Does the key of the field have to be missing?

For a String field of name `fieldX` with default value `"default"`,
```{
  "schemaName": "HitExecutionView",
  "dimensionFieldSpecs": [
   {
  "name": "fieldX",
  "dataType": "STRING",
  "defaultNullValue": "default"
   },...,
]}```
if an incoming message has the following payload:
```{
...,
   "fieldX": null,
...,
}```
What is the expected value in Pinot? `null` or `"default"` ?@mayanks: I’m the incoming null gets translated into default null value and stored in Pinot. So in your example, “default” will be stored   @pedro.cls93: I'm seeing differently, would you mind joining a call with me and taking a look?@anusha.munukuntla: @anusha.munukuntla has joined the channel@kylebanker: @kylebanker has joined the channel@ken: My ops guy is setting up Docker containers, and wants to know why the base Pinot Dockerfile has
```VOLUME ["${PINOT_HOME}/configs", "${PINOT_HOME}/data"]```
since he sees that there’s nothing being stored in the `/data` directory. Any input?  @mayanks: Servers will store local copy of segments there?  @ken: But normally local copies of segments are stored in `/tmp/xxx`, or so I thought?  @dlavoie: By defaults, the OSS helm chart will configure $HOME/data as the data dir for pinot  @dlavoie: It’s in line with the default value of `controller.data.dir` of the helm chart.  @ken: Hmm, OK. So since we’re using HDFS as the deep store, this wouldn’t be getting used, right?  @dlavoie: Indeed  @dlavoie: But keep in mind that servers will use that path  @dlavoie: So the volume defined in the docker image is relevant for the segments stored by the servers.  @ken: But wouldn’t you want that to be temp storage, and not mapped outside of Docker?  @dlavoie: Nope  @dlavoie: It’s the same as kafka  @dlavoie: sure  @dlavoie: brokers can rebuild their data from other replicas and deepstore and everything  @dlavoie: But, trust me, if you want to avoid network jittering when your server are restarting, you’ll be happy with a persistent volume of your segments for the servers  @dlavoie: Segment FS hosted by server should not be considered temporary  @dlavoie: Deepstore download is a fallback in case of lost  @ken: I’ll have to poke around in one of our server processes to see why the ops guy thinks there’s nothing in /data  @ken: Thanks for the input  @dlavoie: Check how your server data dir is configured  @dlavoie: If you want to speed up server restart and avoid redownloading segments from deepstore, configuring the data dir of server in a persistent volume will improve stability of your cluster greatly when things go wrong  @ken: Right. So this would be a `server.data.dir` configuration value?  @dlavoie: `pinot.server.instance.dataDir` :upside_down_face:  @dlavoie: the takeaway is that the volume defined in the dockerfile is opiniated with the oss helm chart and not aligned with the default values from the…. dockerfile itself…  @ken: Nice. I guess `pinot.server.instance.segmentTarDir` can be a temp dir then.  @dlavoie: not exactly  @dlavoie: turns out it more subtle than that :slightly_smiling_face:  @dlavoie: ```  dataDir: /var/pinot/server/data/index
  segmentTarDir: /var/pinot/server/data/segment```  @dlavoie: `pinot.server.instance.dataDir` is the index storage location, and `pinot.server.instance.segmentTarDir` is the tgz dir  @dlavoie: helm chart stores them both in the same `data` volume of the dockerfile  @ken: OK - seems like  could use some editing love. Currently says for `pinot.server.instance.dataDir` “Directory to hold all the data”, and for `pinot.server.instance.segmentTarDir` “Directory to hold temporary segments downloaded from Controller or Deep 

Re: DISCUSS Pinot Graduation

2021-06-01 Thread Mayank Shrivastava
Mentors,
Gentle ping, are we good to open a vote for Pinot's graduation? We have
addressed all the issues brought up by Felix.

Regards,
Mayank

On Sat, May 22, 2021 at 9:44 PM Mayank Shrivastava 
wrote:

> Gentle ping to mentors to guide us with graduation discussion. We have
> addressed the comments raised by Felix in the thread earlier. Also
> requesting comments from Oliver and Jim.
> Thanks for your help.
>
> Regards,
> Mayank
>
> On Mon, May 17, 2021 at 5:38 PM Mayank Shrivastava 
> wrote:
>
>> Thanks Felix, for your valuable inputs. We have updated the project page
>> and also added the maturity model. Please let us know if there are other
>> things we should address.
>>
>> 1. Project Page 
>> 2. Maturity Model
>> 
>>
>> On Sun, May 16, 2021 at 1:57 PM Felix Cheung 
>> wrote:
>>
>>> Agreed it is not a blocker per se, but I figure it will raise some
>>> questions if more board/podling reports are missed.
>>>
>>> Maybe worthwhile to setup forwarding from dev@ to slack.
>>>
>>>
>>> --
>>> *From:* kishore g 
>>> *Sent:* Saturday, May 15, 2021 6:53 PM
>>> *To:* dev@pinot.apache.org
>>> *Cc:* Mayank Shrivastava; j...@apache.org; ol...@apache.org
>>> *Subject:* Re: DISCUSS Pinot Graduation
>>>
>>> I agree with you on traffic on dev. We tried to move few discussions to
>>> dev@ but entire community is active on the slack channel and prefer
>>> discussion on GitHub issues. We push everything to dev@ so folks on ML
>>> have all the information.
>>>
>>> We have < 100 on @dev vs ~1300 on slack. Trying to push slack members to
>>> join dev did not work in the past and I doubt it will work now. We will
>>> continue to make sure all discussions on slack are sent as digest to dev.
>>>
>>> We have mentioned this in our reports last year and we did not hear any
>>> objections from the board.
>>>
>>>
>>> Thanks,
>>> Kishore G
>>>
>>> On Fri, May 14, 2021 at 8:38 PM Felix Cheung 
>>> wrote:
>>>
 Generally it looks good, I’ve checked clutch report, website checks
 etc, but a few reminders and areas to pay attention to:

 - dev@ traffic is very or almost zero? I realize the community is
 active on slack and summary is sent to dev@ everyday, but some traffic
 there will be good...

 - ... because people will miss stuff. The podling report reminder was
 sent there and Pinot just missed this month’s podling report. Let’s make
 sure the month is out promptly next month

 - pls make sure the incubation status is updated
 http://incubator.apache.org/projects/pinot - for instance either the
 committer list is not sorted or the last date is wrong (should not be
 2020). The rest of the page can use updating too. Also many sections there
 have placeholder content, pls fill them in.

 - also as suggested, please take the maturity model, fill it in and
 share with dev@ anything identified - project maturity model (as a
 guide)

 https://community.apache.org/apache-way/apache-project-maturity-model.html



 On Fri, May 14, 2021 at 3:13 PM Mayank Shrivastava 
 wrote:

> Mentors - Felix, Olivier, Jim,
> Wondering what your thoughts on are for proposing Pinot's graduation.
> We have addressed all the issues that have been brought up in the past. If
> there are other steps to be taken, please let us know and we can also take
> care of those as well. Looking forward to your suggestions and support.
>
> Regards,
> Mayank
>
> On Mon, May 10, 2021 at 12:54 PM Fu Xiang 
> wrote:
>
>> +1! Glad to see we've accomplished a lot and the community is pretty
>> strong and healthy!
>>
>> On Mon, May 10, 2021 at 11:23 AM Subbu Subramaniam <
>> mcvsu...@apache.org> wrote:
>>
>>> +1
>>>
>>> Let us know how we can help with the graduation, and if there are
>>> any pending items to be resolved.
>>>
>>> -Subbu
>>>
>>> On 2021/05/09 14:07:45, kishore g  wrote:
>>> > Hello,
>>> >
>>> >
>>> > I would like to start a conversation about the readiness of Apache
>>> Pinot to
>>> > graduate. We have come a long way since we incubated in Apache,
>>> with:
>>> >
>>> >
>>> >-
>>> >
>>> >7800+ contributions from 168 contributors
>>> >-
>>> >
>>> >7 releases by various committers
>>> >-
>>> >
>>> >6 new committers invited (all accepted)
>>> >-
>>> >
>>> >Apache website available at: https://pinot.apache.org
>>> >-
>>> >
>>> >Updated Apache Pinot (incubating) page
>>> >
>>> >-
>>> >
>>> >Updated Roster Page <
>>> https://whimsy.apache.org/roster/ppmc/pinot>
>>> >-