Re: About maintaining the Helm's Chart of Apache Druid

2021-07-04 Thread Benedict Jin
Hi Jihoon,

Last week I asked the AFS, and according to the replay, it seems that only our 
IPMC has the authority to launch the IP Clearance process. FYI, 
https://lists.apache.org/thread.html/rfbfc5951c4524c0e68223e4fbe05a7d7ee26c185ab557d6f77a4989d%40%3Cgeneral.incubator.apache.org%3E

Regards,
Benedict Jin

On 2021/07/02 22:56:08, Jihoon Son  wrote: 
> Hey Benedict,
> 
> Any updates on this issue? I think we are going to start the release
> process for 0.22.0 soon.
> 
> On Fri, Jul 2, 2021 at 1:19 AM Benedict Jin  wrote:
> 
> > Hi Xavier,
> >
> > I'm so happy to hear that and look forward to your changes will be
> > contributed to upstream. In fact, Helm and Operator are not in conflict,
> > their relationship is kind like RPM and Systemd. You can even convert Helm
> > into Operator, or build Operator based on Helm. And I agree with you that
> > it would be better if we can define user scenarios.
> >
> > Regards,
> > Benedict Jin
> >
> > On 2021/06/25 22:42:32, Xavier Lรฉautรฉ 
> > wrote:
> > > For what it's worth, we have been using a heavily modified version of
> > this
> > > helm chart at Confluent.
> > >
> > > I would say it is good to get a Druid cluster up and running quickly, but
> > > we had to make some significant changes to make it easier to operate a
> > > Druid cluster.
> > > It's great for initial deployment and getting all the required
> > dependencies
> > > in place, but operations are somewhat painful and require a lot of
> > internal
> > > Druid knowledge to not shoot yourself in the foot.
> > > Our original intention was to contribute back those changes upstream, but
> > > we have not had the time to put it in a shape that would allow others to
> > > use it.
> > >
> > > We should try to define what we want this chart to be used for, since I
> > > think the Druid k8s operator is probably a better choice for someone to
> > run
> > > and upgrade a meaningful cluster.
> > > Another option would be to focus our effort on the Druid operator and
> > maybe
> > > build a helm chart to get that and our external dependencies in place, I
> > > think we can provide a better experience that way.
> > > One concern with the pure helm chart is that we'll get a lot of questions
> > > on how to operate it that will likely take a lot of time to answer.
> > > Considering we'd have helm, k8s operator, and docker-compose, I think we
> > > should be conscious of the time it would take to maintain all those ways
> > of
> > > running Druid in containers and what purpose each of them serves.
> > >
> > > Just my 2ยข,
> > > Xavier
> > >
> > > On Tue, Jun 22, 2021 at 8:04 AM Benedict Jin 
> > wrote:
> > >
> > > > Hi Jihoon Son,
> > > >
> > > > Cool, thanks a lot ๐Ÿ‘๐Ÿ‘๐Ÿ‘
> > > >
> > > > Regards,
> > > > Benedict Jin
> > > >
> > > > On 2021/06/21 17:10:13, Jihoon Son  wrote:
> > > > > Thanks Benedict.
> > > > > You can find another example of the IP clearance process here:
> > > > >
> > > >
> > https://mail-archives.apache.org/mod_mbox/incubator-general/202106.mbox/browser
> > > > .
> > > > >
> > > > > On Fri, May 28, 2021 at 1:01 AM Benedict Jin 
> > > > wrote:
> > > > > >
> > > > > > Hi Jihoon Son,
> > > > > >
> > > > > > Yes, it has only been tested on the local cluster. Next, I will add
> > > > the automated test part.
> > > > > >
> > > > > > Because Helm Chart usually releases a new version every time it is
> > > > modified. So Apache Superset has followed this approach. Of course, I
> > agree
> > > > with your suggestion, we shouldn't release the version of Chart
> > separately,
> > > > and follow the release rhythm of Apache Druid.
> > > > > >
> > > > > > I've been busy recently, but I also took time to study this IP
> > > > licensing process, and I'm filling out related forms. Thank you very
> > much
> > > > for the mailing list, it's very useful to me.
> > > > > >
> > > > > > Okay, I got it. We will solve this problem as soon as possible
> > before
> > > > the release of 0.22.0.
> > > > > >
> > > > > > Regards,
> > >

Re: About maintaining the Helm's Chart of Apache Druid

2021-07-02 Thread Benedict Jin
Hi Xavier,

I'm so happy to hear that and look forward to your changes will be contributed 
to upstream. In fact, Helm and Operator are not in conflict, their relationship 
is kind like RPM and Systemd. You can even convert Helm into Operator, or build 
Operator based on Helm. And I agree with you that it would be better if we can 
define user scenarios.

Regards,
Benedict Jin

On 2021/06/25 22:42:32, Xavier Lรฉautรฉ  wrote: 
> For what it's worth, we have been using a heavily modified version of this
> helm chart at Confluent.
> 
> I would say it is good to get a Druid cluster up and running quickly, but
> we had to make some significant changes to make it easier to operate a
> Druid cluster.
> It's great for initial deployment and getting all the required dependencies
> in place, but operations are somewhat painful and require a lot of internal
> Druid knowledge to not shoot yourself in the foot.
> Our original intention was to contribute back those changes upstream, but
> we have not had the time to put it in a shape that would allow others to
> use it.
> 
> We should try to define what we want this chart to be used for, since I
> think the Druid k8s operator is probably a better choice for someone to run
> and upgrade a meaningful cluster.
> Another option would be to focus our effort on the Druid operator and maybe
> build a helm chart to get that and our external dependencies in place, I
> think we can provide a better experience that way.
> One concern with the pure helm chart is that we'll get a lot of questions
> on how to operate it that will likely take a lot of time to answer.
> Considering we'd have helm, k8s operator, and docker-compose, I think we
> should be conscious of the time it would take to maintain all those ways of
> running Druid in containers and what purpose each of them serves.
> 
> Just my 2ยข,
> Xavier
> 
> On Tue, Jun 22, 2021 at 8:04 AM Benedict Jin  wrote:
> 
> > Hi Jihoon Son,
> >
> > Cool, thanks a lot ๐Ÿ‘๐Ÿ‘๐Ÿ‘
> >
> > Regards,
> > Benedict Jin
> >
> > On 2021/06/21 17:10:13, Jihoon Son  wrote:
> > > Thanks Benedict.
> > > You can find another example of the IP clearance process here:
> > >
> > https://mail-archives.apache.org/mod_mbox/incubator-general/202106.mbox/browser
> > .
> > >
> > > On Fri, May 28, 2021 at 1:01 AM Benedict Jin 
> > wrote:
> > > >
> > > > Hi Jihoon Son,
> > > >
> > > > Yes, it has only been tested on the local cluster. Next, I will add
> > the automated test part.
> > > >
> > > > Because Helm Chart usually releases a new version every time it is
> > modified. So Apache Superset has followed this approach. Of course, I agree
> > with your suggestion, we shouldn't release the version of Chart separately,
> > and follow the release rhythm of Apache Druid.
> > > >
> > > > I've been busy recently, but I also took time to study this IP
> > licensing process, and I'm filling out related forms. Thank you very much
> > for the mailing list, it's very useful to me.
> > > >
> > > > Okay, I got it. We will solve this problem as soon as possible before
> > the release of 0.22.0.
> > > >
> > > > Regards,
> > > > Benedict Jin
> > > >
> > > > On 2021/05/22 18:53:35, Jihoon Son  wrote:
> > > > > Thanks for adding details.
> > > > >
> > > > > Based on your answers, I assume it is not being tested. We should add
> > > > > tests as soon as possible. Since you have some experience in this
> > > > > area, can you add some?
> > > > > I'm also not sure why Superset releases the helm chart separately. I
> > > > > would suggest releasing it per our regular release schedule unless
> > > > > there is a good reason for doing so. This will reduce the release
> > > > > burden of the community.
> > > > >
> > > > > So far, it seems reasonable to me to host the helm chart in the Druid
> > > > > repo. However, as I mentioned before, the migration process might not
> > > > > be proper. I haven't had a chance to look at the IP clearance process
> > > > > closely yet and probably will not have some even in the near future.
> > > > > Benedict, you are a PMC member too. Can you please study the process
> > > > > and give us suggestions on what we should do? Since not many people
> > in
> > > > > the Druid community might not be familiar with this process, you may
&g

Re: About maintaining the Helm's Chart of Apache Druid

2021-06-22 Thread Benedict Jin
Hi Jihoon Son,

Cool, thanks a lot ๐Ÿ‘๐Ÿ‘๐Ÿ‘

Regards,
Benedict Jin

On 2021/06/21 17:10:13, Jihoon Son  wrote: 
> Thanks Benedict.
> You can find another example of the IP clearance process here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/202106.mbox/browser.
> 
> On Fri, May 28, 2021 at 1:01 AM Benedict Jin  wrote:
> >
> > Hi Jihoon Son,
> >
> > Yes, it has only been tested on the local cluster. Next, I will add the 
> > automated test part.
> >
> > Because Helm Chart usually releases a new version every time it is 
> > modified. So Apache Superset has followed this approach. Of course, I agree 
> > with your suggestion, we shouldn't release the version of Chart separately, 
> > and follow the release rhythm of Apache Druid.
> >
> > I've been busy recently, but I also took time to study this IP licensing 
> > process, and I'm filling out related forms. Thank you very much for the 
> > mailing list, it's very useful to me.
> >
> > Okay, I got it. We will solve this problem as soon as possible before the 
> > release of 0.22.0.
> >
> > Regards,
> > Benedict Jin
> >
> > On 2021/05/22 18:53:35, Jihoon Son  wrote:
> > > Thanks for adding details.
> > >
> > > Based on your answers, I assume it is not being tested. We should add
> > > tests as soon as possible. Since you have some experience in this
> > > area, can you add some?
> > > I'm also not sure why Superset releases the helm chart separately. I
> > > would suggest releasing it per our regular release schedule unless
> > > there is a good reason for doing so. This will reduce the release
> > > burden of the community.
> > >
> > > So far, it seems reasonable to me to host the helm chart in the Druid
> > > repo. However, as I mentioned before, the migration process might not
> > > be proper. I haven't had a chance to look at the IP clearance process
> > > closely yet and probably will not have some even in the near future.
> > > Benedict, you are a PMC member too. Can you please study the process
> > > and give us suggestions on what we should do? Since not many people in
> > > the Druid community might not be familiar with this process, you may
> > > want to ask questions in the Apache general mailing list. See
> > > https://www.mail-archive.com/general@incubator.apache.org/msg74849.html
> > > as a reference.
> > >
> > > We should resolve this issue before the 0.22.0 release. Otherwise, we
> > > will have to revert all changes related to the helm chart in the
> > > release branch because it doesn't seem necessarily a release blocker
> > > for 0.22.0.
> > >
> > >
> > > On Sun, May 2, 2021 at 8:15 PM Benedict Jin  wrote:
> > > >
> > > > Hi Jihoon Son,
> > > >
> > > > Thank you very much for this list of questions. The following is my 
> > > > personal understanding.
> > > >
> > > > 1. What is the current status of the project?
> > > >
> > > > Helm has undergone a major version upgrade. Helm3 will continue to be 
> > > > maintained, but Helm2 will no longer be maintained, and the Helm Chart 
> > > > related to the Apache project is no longer maintained in Helm3. It is 
> > > > recommended to maintain it in their respective Apache projects.
> > > >
> > > > 2. Does it reflect the most recent release of Druid?
> > > >
> > > > Will not affect the release of Druid. Druid's Helm Chart is released 
> > > > separately. FYI, https://github.com/apache/superset/releases
> > > >
> > > > 3. How is it being tested?
> > > >
> > > > It can be verified and tested automatically through Github Action. FYI, 
> > > > https://github.com/apache/superset/blob/master/.github/workflows/superset-helm-lint.yml
> > > >
> > > > 4. Why is it best to host the helm chart in the druid repo?
> > > >
> > > > Because Helm3 no longer maintains the Helm Chart of Apache-related 
> > > > projects, the official also recommends maintaining it in their 
> > > > respective projects. Many projects have already done so. The 
> > > > maintenance in the main Druid warehouse can be seen by more people and 
> > > > help more people quickly build the Druid environment on K8S.
> > > >
> > > > Hope these answers can answer some of your doubts. Thanks aga

Re: About maintaining the Helm's Chart of Apache Druid

2021-05-28 Thread Benedict Jin
Hi Jihoon Son,

Yes, it has only been tested on the local cluster. Next, I will add the 
automated test part.

Because Helm Chart usually releases a new version every time it is modified. So 
Apache Superset has followed this approach. Of course, I agree with your 
suggestion, we shouldn't release the version of Chart separately, and follow 
the release rhythm of Apache Druid.

I've been busy recently, but I also took time to study this IP licensing 
process, and I'm filling out related forms. Thank you very much for the mailing 
list, it's very useful to me.

Okay, I got it. We will solve this problem as soon as possible before the 
release of 0.22.0.

Regards,
Benedict Jin

On 2021/05/22 18:53:35, Jihoon Son  wrote: 
> Thanks for adding details.
> 
> Based on your answers, I assume it is not being tested. We should add
> tests as soon as possible. Since you have some experience in this
> area, can you add some?
> I'm also not sure why Superset releases the helm chart separately. I
> would suggest releasing it per our regular release schedule unless
> there is a good reason for doing so. This will reduce the release
> burden of the community.
> 
> So far, it seems reasonable to me to host the helm chart in the Druid
> repo. However, as I mentioned before, the migration process might not
> be proper. I haven't had a chance to look at the IP clearance process
> closely yet and probably will not have some even in the near future.
> Benedict, you are a PMC member too. Can you please study the process
> and give us suggestions on what we should do? Since not many people in
> the Druid community might not be familiar with this process, you may
> want to ask questions in the Apache general mailing list. See
> https://www.mail-archive.com/general@incubator.apache.org/msg74849.html
> as a reference.
> 
> We should resolve this issue before the 0.22.0 release. Otherwise, we
> will have to revert all changes related to the helm chart in the
> release branch because it doesn't seem necessarily a release blocker
> for 0.22.0.
> 
> 
> On Sun, May 2, 2021 at 8:15 PM Benedict Jin  wrote:
> >
> > Hi Jihoon Son,
> >
> > Thank you very much for this list of questions. The following is my 
> > personal understanding.
> >
> > 1. What is the current status of the project?
> >
> > Helm has undergone a major version upgrade. Helm3 will continue to be 
> > maintained, but Helm2 will no longer be maintained, and the Helm Chart 
> > related to the Apache project is no longer maintained in Helm3. It is 
> > recommended to maintain it in their respective Apache projects.
> >
> > 2. Does it reflect the most recent release of Druid?
> >
> > Will not affect the release of Druid. Druid's Helm Chart is released 
> > separately. FYI, https://github.com/apache/superset/releases
> >
> > 3. How is it being tested?
> >
> > It can be verified and tested automatically through Github Action. FYI, 
> > https://github.com/apache/superset/blob/master/.github/workflows/superset-helm-lint.yml
> >
> > 4. Why is it best to host the helm chart in the druid repo?
> >
> > Because Helm3 no longer maintains the Helm Chart of Apache-related 
> > projects, the official also recommends maintaining it in their respective 
> > projects. Many projects have already done so. The maintenance in the main 
> > Druid warehouse can be seen by more people and help more people quickly 
> > build the Druid environment on K8S.
> >
> > Hope these answers can answer some of your doubts. Thanks again.
> >
> > Regards,
> > Benedict Jin
> >
> > On 2021/04/27 03:17:29, Julian Hyde  wrote:
> > > This code was developed outside of the ASF, so itโ€™s possible that we need 
> > > to go through the IP clearance process [1]. Can a PMC member please 
> > > figure out the answer to that question, and answer on this list.
> > >
> > > Has the Helm project given any indication whether they approve or 
> > > disapprove of the code being copied into Druid?
> > >
> > > Does Druid intend to take ownership of the code? I.e. be the one and only 
> > > copy of this code, do necessary maintenance work (especially including 
> > > security fixes) and accepting patches.
> > >
> > > Julian
> > >
> > > [1] https://incubator.apache.org/ip-clearance/
> > >
> > >
> > >
> > > > On Apr 26, 2021, at 7:33 PM, Benedict Jin  wrote:
> > > >
> > > > Hi Senlan,
> > > >
> > > > Thank you very much for your message and support. I have created a PR 
> > > &

Re: About maintaining the Helm's Chart of Apache Druid

2021-05-02 Thread Benedict Jin
Hi Jihoon Son,

Thank you very much for this list of questions. The following is my personal 
understanding.

1. What is the current status of the project?

Helm has undergone a major version upgrade. Helm3 will continue to be 
maintained, but Helm2 will no longer be maintained, and the Helm Chart related 
to the Apache project is no longer maintained in Helm3. It is recommended to 
maintain it in their respective Apache projects.

2. Does it reflect the most recent release of Druid?

Will not affect the release of Druid. Druid's Helm Chart is released 
separately. FYI, https://github.com/apache/superset/releases

3. How is it being tested?

It can be verified and tested automatically through Github Action. FYI, 
https://github.com/apache/superset/blob/master/.github/workflows/superset-helm-lint.yml

4. Why is it best to host the helm chart in the druid repo?

Because Helm3 no longer maintains the Helm Chart of Apache-related projects, 
the official also recommends maintaining it in their respective projects. Many 
projects have already done so. The maintenance in the main Druid warehouse can 
be seen by more people and help more people quickly build the Druid environment 
on K8S.

Hope these answers can answer some of your doubts. Thanks again.

Regards,
Benedict Jin

On 2021/04/27 03:17:29, Julian Hyde  wrote: 
> This code was developed outside of the ASF, so itโ€™s possible that we need to 
> go through the IP clearance process [1]. Can a PMC member please figure out 
> the answer to that question, and answer on this list.
> 
> Has the Helm project given any indication whether they approve or disapprove 
> of the code being copied into Druid?
> 
> Does Druid intend to take ownership of the code? I.e. be the one and only 
> copy of this code, do necessary maintenance work (especially including 
> security fixes) and accepting patches.
> 
> Julian
> 
> [1] https://incubator.apache.org/ip-clearance/
> 
> 
> 
> > On Apr 26, 2021, at 7:33 PM, Benedict Jin  wrote:
> > 
> > Hi Senlan,
> > 
> > Thank you very much for your message and support. I have created a PR to do 
> > the migration, referring to the experience of Apache Superset. FYI, 
> > https://github.com/apache/druid/pull/11163 and 
> > https://github.com/apache/superset/tree/master/helm/superset .
> > 
> > Regards,
> > Benedict Jin
> > 
> > On 2021/04/26 12:59:12, Senlan Yao  wrote: 
> >> Thanks @Benedict Jin,
> >> It is a good idea to maintain the Druid Helm's Chart in the Apache 
> >> repository.
> >> Since "https://github.com/helm/charts"; repo has been deprecation, we can't 
> >> maintain the chart, and the k8s user can't find druid chart package from 
> >> "https://artifacthub.io/packages/search?page=1&ts_query_web=druid"; any 
> >> more.
> >> If we can maintain chart in the Apache repository, it will solve 
> >> https://github.com/apache/druid/issues/5582, ans also the the k8s user can 
> >> install Druid from helm chart package. 
> >> 
> >> On 2021/04/22 01:58:46, Benedict Jin  wrote: 
> >>> Hi all,
> >>> 
> >>> Should we maintain the Helm's Chart in the Apache Druid repository? Now 
> >>> the development and maintenance of Helm's Chart has been stalled. It was 
> >>> previously maintained by @maver1ck @AWaterColorPen and me (@asdf2014). 
> >>> Currently, https://github.com/helm/charts/tree/master/incubator/druid 
> >>> cannot be maintained. I recommend that we just copy this directory 
> >>> directly to the root directory of https://github.com/apache/druid and 
> >>> maintain it. But I'm not so sure whether there is a license issue. What 
> >>> do you think?
> >>> 
> >>> Regards,
> >>> Benedict Jin
> >>> 
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> >>> For additional commands, e-mail: dev-h...@druid.apache.org
> >>> 
> >>> 
> >> 
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> >> For additional commands, e-mail: dev-h...@druid.apache.org
> >> 
> >> 
> > 
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> > 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: About maintaining the Helm's Chart of Apache Druid private@druid

2021-04-26 Thread Benedict Jin
Hi Senlan,

Thank you very much for your message and support. I have created a PR to do the 
migration, referring to the experience of Apache Superset. FYI, 
https://github.com/apache/druid/pull/11163 and 
https://github.com/apache/superset/tree/master/helm/superset .

Regards,
Benedict Jin

On 2021/04/26 12:59:12, Senlan Yao  wrote: 
> Thanks @Benedict Jin,
> It is a good idea to maintain the Druid Helm's Chart in the Apache repository.
> Since "https://github.com/helm/charts"; repo has been deprecation, we can't 
> maintain the chart, and the k8s user can't find druid chart package from 
> "https://artifacthub.io/packages/search?page=1&ts_query_web=druid"; any more.
> If we can maintain chart in the Apache repository, it will solve 
> https://github.com/apache/druid/issues/5582, ans also the the k8s user can 
> install Druid from helm chart package. 
> 
> On 2021/04/22 01:58:46, Benedict Jin  wrote: 
> > Hi all,
> > 
> > Should we maintain the Helm's Chart in the Apache Druid repository? Now the 
> > development and maintenance of Helm's Chart has been stalled. It was 
> > previously maintained by @maver1ck @AWaterColorPen and me (@asdf2014). 
> > Currently, https://github.com/helm/charts/tree/master/incubator/druid 
> > cannot be maintained. I recommend that we just copy this directory directly 
> > to the root directory of https://github.com/apache/druid and maintain it. 
> > But I'm not so sure whether there is a license issue. What do you think?
> > 
> > Regards,
> > Benedict Jin
> > 
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> > 
> > 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



About maintaining the Helm's Chart of Apache Druid private@druid

2021-04-21 Thread Benedict Jin
Hi all,

Should we maintain the Helm's Chart in the Apache Druid repository? Now the 
development and maintenance of Helm's Chart has been stalled. It was previously 
maintained by @maver1ck @AWaterColorPen and me (@asdf2014). Currently, 
https://github.com/helm/charts/tree/master/incubator/druid cannot be 
maintained. I recommend that we just copy this directory directly to the root 
directory of https://github.com/apache/druid and maintain it. But I'm not so 
sure whether there is a license issue. What do you think?

Regards,
Benedict Jin

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-07 Thread Benedict Jin
Hi Julian Jaffe,

Thank you very much. I haven't tried it yet. Can you provide a more specific 
example. In theory, adding indexes will slow down the speed of adding and 
updating operations. In your scenario, what percentage is this performance loss 
reached? Yes, for the bottleneck of Coordinator, do we consider introducing the 
Federation architecture to Apache Druid?

Regards,
Benedict Jin

On 2021/04/07 06:27:58, Julian Jaffe  wrote: 
> Hey Benedict,
> 
> Have you tried creating indices on your segments table? Iโ€™ve managed Druid 
> clusters with orders of magnitude more segments without this issue by 
> indexing key filter columns. (The coordinator is still a painful bottle neck, 
> just not due to query times to the metadata server ๐Ÿ˜›)
> 
> Best,
> Julian
> 
> > On Apr 6, 2021, at 8:53 PM, Benedict Jin  wrote:
> > 
> > ๏ปฟHi Jihoon Son,
> > 
> > Yes, it does bring some compatibility issues. I was checking the latest 
> > metadata information just now. At present, the total number of records in 
> > the metadata table is five million, of which nearly half are marked as 
> > used, and the physical resources of the machine where the metadata is 
> > stored are relatively idle.
> > 
> > Regards,
> > Benedict Jin
> > 
> >> On 2021/04/07 02:35:32, Jihoon Son  wrote: 
> >> For this sort of issue, we should think about if there is any other
> >> way that can address the same problem without modifying metadata table
> >> schema.
> >> Because, modifying metadata table schema introduces compatibility
> >> issues, such as the upgrade path for existing users.
> >> 
> >> Benedict, as Samarth and Lucas pointed out, it would be nice if you
> >> share more details of exactly where the bottleneck is. That will make
> >> the problem clearer and get everyone on the same page.
> >> 
> >>> On Tue, Apr 6, 2021 at 6:54 PM Benedict Jin  wrote:
> >>> 
> >>> Hi Ben Krug,
> >>> 
> >>> +1 for adding the is_deleted column, and then we can create a timing 
> >>> trigger to clear these old records.
> >>> 
> >>> Regards,
> >>> Benedict Jin
> >>> 
> >>> On 2021/04/06 18:28:45, Ben Krug  wrote:
> >>>> Oh, that's easier than tombstones.  flag is_deleted and update timestamp
> >>>> (so it gets pulled again).
> >>>> 
> >>>> On Tue, Apr 6, 2021 at 10:48 AM Tijo Thomas  
> >>>> wrote:
> >>>> 
> >>>>> Abhishek,
> >>>>> Good point.  Do we need one more col for storing if it's deleted or not?
> >>>>> 
> >>>>> On Tue, Apr 6, 2021 at 4:32 PM Abhishek Agarwal 
> >>>>>  >>>>>> 
> >>>>> wrote:
> >>>>> 
> >>>>>> If an entry is deleted from the metadata, how is the coordinator going 
> >>>>>> to
> >>>>>> update its own state?
> >>>>>> 
> >>>>>> On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe  wrote:
> >>>>>> 
> >>>>>>> Hey,
> >>>>>>> I'm not a Druid developer, so it's quite possible I'm missing many
> >>>>>>> considerations here, but from a first glance, I like your offer, as it
> >>>>>>> resembles the *tsColumn *in JDBC lookups (
> >>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>> https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> >>>>>>> ).
> >>>>>>> 
> >>>>>>> Anyway, just my 2 cents.
> >>>>>>> 
> >>>>>>> Thanks!
> >>>>>>>  Itai
> >>>>>>> 
> >>>>>>> On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin 
> >>>>> wrote:
> >>>>>>> 
> >>>>>>>> Hi all,
> >>>>>>>> 
> >>>>>>>> Recently, when the Coordinator in our company's Druid cluster pulls
> >>>>>>>> metadata, there is a performance bottleneck. The main reason is the
> >>>>>> huge
> >>>>>>>> amount of metadata, which leads to a very slow process of scanning
> >>>>> the
> >>>>>>> full
> >>>>>>>> table of me

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Jihoon Son,

Yes, it does bring some compatibility issues. I was checking the latest 
metadata information just now. At present, the total number of records in the 
metadata table is five million, of which nearly half are marked as used, and 
the physical resources of the machine where the metadata is stored are 
relatively idle.

Regards,
Benedict Jin

On 2021/04/07 02:35:32, Jihoon Son  wrote: 
> For this sort of issue, we should think about if there is any other
> way that can address the same problem without modifying metadata table
> schema.
> Because, modifying metadata table schema introduces compatibility
> issues, such as the upgrade path for existing users.
> 
> Benedict, as Samarth and Lucas pointed out, it would be nice if you
> share more details of exactly where the bottleneck is. That will make
> the problem clearer and get everyone on the same page.
> 
> On Tue, Apr 6, 2021 at 6:54 PM Benedict Jin  wrote:
> >
> > Hi Ben Krug,
> >
> > +1 for adding the is_deleted column, and then we can create a timing 
> > trigger to clear these old records.
> >
> > Regards,
> > Benedict Jin
> >
> > On 2021/04/06 18:28:45, Ben Krug  wrote:
> > > Oh, that's easier than tombstones.  flag is_deleted and update timestamp
> > > (so it gets pulled again).
> > >
> > > On Tue, Apr 6, 2021 at 10:48 AM Tijo Thomas  
> > > wrote:
> > >
> > > > Abhishek,
> > > > Good point.  Do we need one more col for storing if it's deleted or not?
> > > >
> > > > On Tue, Apr 6, 2021 at 4:32 PM Abhishek Agarwal 
> > > >  > > > >
> > > > wrote:
> > > >
> > > > > If an entry is deleted from the metadata, how is the coordinator 
> > > > > going to
> > > > > update its own state?
> > > > >
> > > > > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe  
> > > > > wrote:
> > > > >
> > > > > > Hey,
> > > > > > I'm not a Druid developer, so it's quite possible I'm missing many
> > > > > > considerations here, but from a first glance, I like your offer, as 
> > > > > > it
> > > > > > resembles the *tsColumn *in JDBC lookups (
> > > > > >
> > > > > >
> > > > >
> > > > https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > > > > > ).
> > > > > >
> > > > > > Anyway, just my 2 cents.
> > > > > >
> > > > > > Thanks!
> > > > > >   Itai
> > > > > >
> > > > > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin 
> > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Recently, when the Coordinator in our company's Druid cluster 
> > > > > > > pulls
> > > > > > > metadata, there is a performance bottleneck. The main reason is 
> > > > > > > the
> > > > > huge
> > > > > > > amount of metadata, which leads to a very slow process of scanning
> > > > the
> > > > > > full
> > > > > > > table of metadata storage and deserializing metadata. The size of 
> > > > > > > the
> > > > > > full
> > > > > > > metadata has been reduced through TTL, Compaction, Rollup, and 
> > > > > > > etc.,
> > > > > but
> > > > > > > the effect is not very significant. Therefore, I want to design a
> > > > > scheme
> > > > > > > for Coordinator to pull metadata incrementally, that is, each time
> > > > > > > Coordinator only pulls newly added metadata, so as to reduce the
> > > > query
> > > > > > > pressure of metadata storage and the pressure of deserializing
> > > > > metadata.
> > > > > > > The general idea is to add a column last_update to the 
> > > > > > > druid_segments
> > > > > > table
> > > > > > > to record the update time of each record. Furthermore, when we 
> > > > > > > query
> > > > > the
> > > > > > > metadata table, we can add filter conditions for the last_update
> > > > column
>

Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Ben Krug,

+1 for adding the is_deleted column, and then we can create a timing trigger to 
clear these old records.

Regards,
Benedict Jin

On 2021/04/06 18:28:45, Ben Krug  wrote: 
> Oh, that's easier than tombstones.  flag is_deleted and update timestamp
> (so it gets pulled again).
> 
> On Tue, Apr 6, 2021 at 10:48 AM Tijo Thomas  wrote:
> 
> > Abhishek,
> > Good point.  Do we need one more col for storing if it's deleted or not?
> >
> > On Tue, Apr 6, 2021 at 4:32 PM Abhishek Agarwal  > >
> > wrote:
> >
> > > If an entry is deleted from the metadata, how is the coordinator going to
> > > update its own state?
> > >
> > > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe  wrote:
> > >
> > > > Hey,
> > > > I'm not a Druid developer, so it's quite possible I'm missing many
> > > > considerations here, but from a first glance, I like your offer, as it
> > > > resembles the *tsColumn *in JDBC lookups (
> > > >
> > > >
> > >
> > https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > > > ).
> > > >
> > > > Anyway, just my 2 cents.
> > > >
> > > > Thanks!
> > > >   Itai
> > > >
> > > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Recently, when the Coordinator in our company's Druid cluster pulls
> > > > > metadata, there is a performance bottleneck. The main reason is the
> > > huge
> > > > > amount of metadata, which leads to a very slow process of scanning
> > the
> > > > full
> > > > > table of metadata storage and deserializing metadata. The size of the
> > > > full
> > > > > metadata has been reduced through TTL, Compaction, Rollup, and etc.,
> > > but
> > > > > the effect is not very significant. Therefore, I want to design a
> > > scheme
> > > > > for Coordinator to pull metadata incrementally, that is, each time
> > > > > Coordinator only pulls newly added metadata, so as to reduce the
> > query
> > > > > pressure of metadata storage and the pressure of deserializing
> > > metadata.
> > > > > The general idea is to add a column last_update to the druid_segments
> > > > table
> > > > > to record the update time of each record. Furthermore, when we query
> > > the
> > > > > metadata table, we can add filter conditions for the last_update
> > column
> > > > to
> > > > > avoid full table scan operations. Moreover, whether it is MySQL or
> > > > > PostgreSQL as the metadata storage medium, it can support
> > > > >  automatic update of the timestamp field, which is somewhat similar
> > to
> > > > the
> > > > > characteristics of triggers. So, have you encountered this problem
> > > > before?
> > > > > If so, how did you solve it? In addition, do you have any suggestions
> > > or
> > > > > comments on the above incremental acquisition of metadata? Please let
> > > me
> > > > > know, thanks a lot.
> > > > >
> > > > > Regards,
> > > > > Benedict Jin
> > > > >
> > > > > -
> > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Thanks & Regards
> > Tijo Thomas
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Samarth Jain,

Thanks. The main reason is the huge amount of metadata, which leads to a very 
slow process of scanning the full table of metadata storage and deserializing 
metadata. Yes, I have tried to clean up the metadata.

Regards,
Benedict Jin

On 2021/04/06 17:20:26, Samarth Jain  wrote: 
> Hi Benedict,
> 
> I am curious to understand what functionality of Druid are you seeing the
> slowness in? Is it the coordinator work of assigning segments to
> historicals that is slower or is it the querying of segment information
> that is slower? Have you looked into CPU/network metrics for your metadata
> RDS? Maybe scaling up to a bigger instance would help. It would also be
> good to see the query patterns and possibly tweak or add new indexes to
> help speed up. Also, do you have the cleanup of metadata rows enabled (
> https://druid.apache.org/docs/latest/tutorials/tutorial-delete-data.html#run-a-kill-task
> and *druid.coordinator.kill*.*on*)   that should further help control the
> size of druid_segments table.
> 
> On Tue, Apr 6, 2021 at 8:08 AM Ben Krug  wrote:
> 
> > I suppose, if we were going down this path, something like tombstones in
> > Cassandra could be used.
> > But it would increase the complexity significantly.
> > Ie, a new row is inserted with a deletion marker and a timestamp, that
> > indicates that the corresponding row is deleted.
> > Now, when anyone does scan the table, they need to check for tombstones too
> > and process that logic.  Then, after
> > a configurable amount of time, both the original row and the tombstone row
> > can be cleaned up.
> >
> > Probably a lot of work and complexity for this one use case, though.
> >
> > On Tue, Apr 6, 2021 at 4:02 AM Abhishek Agarwal  > >
> > wrote:
> >
> > > If an entry is deleted from the metadata, how is the coordinator going to
> > > update its own state?
> > >
> > > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe  wrote:
> > >
> > > > Hey,
> > > > I'm not a Druid developer, so it's quite possible I'm missing many
> > > > considerations here, but from a first glance, I like your offer, as it
> > > > resembles the *tsColumn *in JDBC lookups (
> > > >
> > > >
> > >
> > https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > > > ).
> > > >
> > > > Anyway, just my 2 cents.
> > > >
> > > > Thanks!
> > > >   Itai
> > > >
> > > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Recently, when the Coordinator in our company's Druid cluster pulls
> > > > > metadata, there is a performance bottleneck. The main reason is the
> > > huge
> > > > > amount of metadata, which leads to a very slow process of scanning
> > the
> > > > full
> > > > > table of metadata storage and deserializing metadata. The size of the
> > > > full
> > > > > metadata has been reduced through TTL, Compaction, Rollup, and etc.,
> > > but
> > > > > the effect is not very significant. Therefore, I want to design a
> > > scheme
> > > > > for Coordinator to pull metadata incrementally, that is, each time
> > > > > Coordinator only pulls newly added metadata, so as to reduce the
> > query
> > > > > pressure of metadata storage and the pressure of deserializing
> > > metadata.
> > > > > The general idea is to add a column last_update to the druid_segments
> > > > table
> > > > > to record the update time of each record. Furthermore, when we query
> > > the
> > > > > metadata table, we can add filter conditions for the last_update
> > column
> > > > to
> > > > > avoid full table scan operations. Moreover, whether it is MySQL or
> > > > > PostgreSQL as the metadata storage medium, it can support
> > > > >  automatic update of the timestamp field, which is somewhat similar
> > to
> > > > the
> > > > > characteristics of triggers. So, have you encountered this problem
> > > > before?
> > > > > If so, how did you solve it? In addition, do you have any suggestions
> > > or
> > > > > comments on the above incremental acquisition of metadata? Please let
> > > me
> > > > > know, thanks a lot.
> > > > >
> > > > > Regards,
> > > > > Benedict Jin
> > > > >
> > > > > -
> > > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Ben Krug,

Thank you very much for your ideas, but I also feel that the introduction of 
Cassandra is too heavy. The tombstones feature in Cassandra you mentioned can 
actually be supported by timed tasks in MySQL or PostgreSQL.

Regards,
Benedict Jin

On 2021/04/06 15:08:03, Ben Krug  wrote: 
> I suppose, if we were going down this path, something like tombstones in
> Cassandra could be used.
> But it would increase the complexity significantly.
> Ie, a new row is inserted with a deletion marker and a timestamp, that
> indicates that the corresponding row is deleted.
> Now, when anyone does scan the table, they need to check for tombstones too
> and process that logic.  Then, after
> a configurable amount of time, both the original row and the tombstone row
> can be cleaned up.
> 
> Probably a lot of work and complexity for this one use case, though.
> 
> On Tue, Apr 6, 2021 at 4:02 AM Abhishek Agarwal 
> wrote:
> 
> > If an entry is deleted from the metadata, how is the coordinator going to
> > update its own state?
> >
> > On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe  wrote:
> >
> > > Hey,
> > > I'm not a Druid developer, so it's quite possible I'm missing many
> > > considerations here, but from a first glance, I like your offer, as it
> > > resembles the *tsColumn *in JDBC lookups (
> > >
> > >
> > https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > > ).
> > >
> > > Anyway, just my 2 cents.
> > >
> > > Thanks!
> > >   Itai
> > >
> > > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin  wrote:
> > >
> > > > Hi all,
> > > >
> > > > Recently, when the Coordinator in our company's Druid cluster pulls
> > > > metadata, there is a performance bottleneck. The main reason is the
> > huge
> > > > amount of metadata, which leads to a very slow process of scanning the
> > > full
> > > > table of metadata storage and deserializing metadata. The size of the
> > > full
> > > > metadata has been reduced through TTL, Compaction, Rollup, and etc.,
> > but
> > > > the effect is not very significant. Therefore, I want to design a
> > scheme
> > > > for Coordinator to pull metadata incrementally, that is, each time
> > > > Coordinator only pulls newly added metadata, so as to reduce the query
> > > > pressure of metadata storage and the pressure of deserializing
> > metadata.
> > > > The general idea is to add a column last_update to the druid_segments
> > > table
> > > > to record the update time of each record. Furthermore, when we query
> > the
> > > > metadata table, we can add filter conditions for the last_update column
> > > to
> > > > avoid full table scan operations. Moreover, whether it is MySQL or
> > > > PostgreSQL as the metadata storage medium, it can support
> > > >  automatic update of the timestamp field, which is somewhat similar to
> > > the
> > > > characteristics of triggers. So, have you encountered this problem
> > > before?
> > > > If so, how did you solve it? In addition, do you have any suggestions
> > or
> > > > comments on the above incremental acquisition of metadata? Please let
> > me
> > > > know, thanks a lot.
> > > >
> > > > Regards,
> > > > Benedict Jin
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > >
> > > >
> > >
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Abhishek Agarwal,

You made a very important point, thank you very much.

Regards,
Benedict Jin

On 2021/04/06 11:02:34, Abhishek Agarwal  wrote: 
> If an entry is deleted from the metadata, how is the coordinator going to
> update its own state?
> 
> On Tue, Apr 6, 2021 at 3:38 PM Itai Yaffe  wrote:
> 
> > Hey,
> > I'm not a Druid developer, so it's quite possible I'm missing many
> > considerations here, but from a first glance, I like your offer, as it
> > resembles the *tsColumn *in JDBC lookups (
> >
> > https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> > ).
> >
> > Anyway, just my 2 cents.
> >
> > Thanks!
> >   Itai
> >
> > On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin  wrote:
> >
> > > Hi all,
> > >
> > > Recently, when the Coordinator in our company's Druid cluster pulls
> > > metadata, there is a performance bottleneck. The main reason is the huge
> > > amount of metadata, which leads to a very slow process of scanning the
> > full
> > > table of metadata storage and deserializing metadata. The size of the
> > full
> > > metadata has been reduced through TTL, Compaction, Rollup, and etc., but
> > > the effect is not very significant. Therefore, I want to design a scheme
> > > for Coordinator to pull metadata incrementally, that is, each time
> > > Coordinator only pulls newly added metadata, so as to reduce the query
> > > pressure of metadata storage and the pressure of deserializing metadata.
> > > The general idea is to add a column last_update to the druid_segments
> > table
> > > to record the update time of each record. Furthermore, when we query the
> > > metadata table, we can add filter conditions for the last_update column
> > to
> > > avoid full table scan operations. Moreover, whether it is MySQL or
> > > PostgreSQL as the metadata storage medium, it can support
> > >  automatic update of the timestamp field, which is somewhat similar to
> > the
> > > characteristics of triggers. So, have you encountered this problem
> > before?
> > > If so, how did you solve it? In addition, do you have any suggestions or
> > > comments on the above incremental acquisition of metadata? Please let me
> > > know, thanks a lot.
> > >
> > > Regards,
> > > Benedict Jin
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > For additional commands, e-mail: dev-h...@druid.apache.org
> > >
> > >
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: Propose a scheme for Coordinator to pull metadata incrementally

2021-04-06 Thread Benedict Jin
Hi Itai Yaffe,

Thank you very much for your support, thank you.

Regards,
Benedict Jin

On 2021/04/06 10:06:45, Itai Yaffe  wrote: 
> Hey,
> I'm not a Druid developer, so it's quite possible I'm missing many
> considerations here, but from a first glance, I like your offer, as it
> resembles the *tsColumn *in JDBC lookups (
> https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global.html#jdbc-lookup
> ).
> 
> Anyway, just my 2 cents.
> 
> Thanks!
>   Itai
> 
> On Tue, Apr 6, 2021 at 6:07 AM Benedict Jin  wrote:
> 
> > Hi all,
> >
> > Recently, when the Coordinator in our company's Druid cluster pulls
> > metadata, there is a performance bottleneck. The main reason is the huge
> > amount of metadata, which leads to a very slow process of scanning the full
> > table of metadata storage and deserializing metadata. The size of the full
> > metadata has been reduced through TTL, Compaction, Rollup, and etc., but
> > the effect is not very significant. Therefore, I want to design a scheme
> > for Coordinator to pull metadata incrementally, that is, each time
> > Coordinator only pulls newly added metadata, so as to reduce the query
> > pressure of metadata storage and the pressure of deserializing metadata.
> > The general idea is to add a column last_update to the druid_segments table
> > to record the update time of each record. Furthermore, when we query the
> > metadata table, we can add filter conditions for the last_update column to
> > avoid full table scan operations. Moreover, whether it is MySQL or
> > PostgreSQL as the metadata storage medium, it can support
> >  automatic update of the timestamp field, which is somewhat similar to the
> > characteristics of triggers. So, have you encountered this problem before?
> > If so, how did you solve it? In addition, do you have any suggestions or
> > comments on the above incremental acquisition of metadata? Please let me
> > know, thanks a lot.
> >
> > Regards,
> > Benedict Jin
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Propose a scheme for Coordinator to pull metadata incrementally

2021-04-05 Thread Benedict Jin
Hi all,

Recently, when the Coordinator in our company's Druid cluster pulls metadata, 
there is a performance bottleneck. The main reason is the huge amount of 
metadata, which leads to a very slow process of scanning the full table of 
metadata storage and deserializing metadata. The size of the full metadata has 
been reduced through TTL, Compaction, Rollup, and etc., but the effect is not 
very significant. Therefore, I want to design a scheme for Coordinator to pull 
metadata incrementally, that is, each time Coordinator only pulls newly added 
metadata, so as to reduce the query pressure of metadata storage and the 
pressure of deserializing metadata. The general idea is to add a column 
last_update to the druid_segments table to record the update time of each 
record. Furthermore, when we query the metadata table, we can add filter 
conditions for the last_update column to avoid full table scan operations. 
Moreover, whether it is MySQL or PostgreSQL as the metadata storage medium, it 
can support 
 automatic update of the timestamp field, which is somewhat similar to the 
characteristics of triggers. So, have you encountered this problem before? If 
so, how did you solve it? In addition, do you have any suggestions or comments 
on the above incremental acquisition of metadata? Please let me know, thanks a 
lot.

Regards,
Benedict Jin

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: New committer: Maytas Monsereenusorn

2020-07-07 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2020/07/08 01:28:37, Gian Merlino  wrote: 
> Hey Druids,
> 
> The Druid PMC has invited Maytas Monsereenusorn (@maytasm
>  on github) to become a committer and we are
> pleased to announce that he has accepted. Maytas has contributed to various
> areas including automated testing improvements and bug fixes. He has also
> been active in reviewing the work of others, even before becoming a
> committer, which is always appreciated.
> 
> Congratulations Maytas!
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: New committer: Suneet Saldanha

2020-07-07 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2020/07/08 01:28:56, Gian Merlino  wrote: 
> Hey Druids,
> 
> The Druid PMC has invited Suneet Saldanha (@suneet-s
>  on github) to become a committer and we are
> pleased to announce that he has accepted. Suneet has contributed to areas
> including the new join functionality, documentation, and general code
> quality. He has also been active in reviewing the work of others, even
> before becoming a committer, which is always appreciated.
> 
> Congratulations Suneet!
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: New committer: David Glasser

2020-07-07 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2020/07/08 01:28:29, Gian Merlino  wrote: 
> Hey Druids,
> 
> The Druid PMC has invited David Glasser (@glasser
>  on github) to become a committer and we are
> pleased to announce that he has accepted. David has contributed an
> improved, parallelized self-ingestion firehose as well as various other
> patches. He has also participated in hosting and speaking at Druid meetups
> in the San Francisco Bay Area.
> 
> Congratulations David!
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: New committer: Lucas Capistrant

2020-07-07 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2020/07/08 01:29:09, Gian Merlino  wrote: 
> Hey Druids,
> 
> The Druid PMC has invited Lucas Capistrant (@capistrant
>  on github) to become a committer and we are
> pleased to announce that he has accepted. Lucas has been active throughout
> the past year, contributing various enhancements and fixes.
> 
> Congratulations Lucas!
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: New committer: Maggie Brewster

2020-07-07 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2020/07/08 01:34:19, Gian Merlino  wrote: 
> Hey Druids,
> 
> The Druid PMC has invited Maggie Brewster (@mcbrewster
>  on github) to become a committer and we are
> pleased to announce that she has accepted. Maggie has made dozens of
> contributions to Druid, especially to the (relatively) new web console.
> 
> Congratulations Maggie!
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: zk watch improvement

2020-01-12 Thread Benedict Jin
It's a very great patch. I have also encountered this situation. Thanks to 
@kaijianding for his impressive work. I can handle and finish this job.

On 2020/01/11 12:33:31, Roman Leventov  wrote: 
> A good improvement is stuck because the author is unresponsive, any
> volunteers to pick it up and follow through?
> https://github.com/apache/druid/pull/6683
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: New committer: Alexander Saydakov

2020-01-07 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2020/01/07 20:03:18, Jihoon Son  wrote: 
> Congratulations Alexander!
> 
> On Tue, Jan 7, 2020 at 11:25 AM Gian Merlino  wrote:
> 
> > Hey Druids,
> >
> > The Druid PMC has invited Alexander Saydakov (@AlexanderSaydakov on GitHub)
> > to become a committer and we are pleased to announce that he has accepted.
> > Alexander has contributed extensively to Druid's DataSketches extension,
> > and is also a committer and PPMC member on the Apache DataSketches project.
> >
> > Congratulations Alexander!
> >
> > Gian
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: New committer: Samarth Jain

2020-01-02 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2020/01/03 01:55:46, Jihoon Son  wrote: 
> Congrats Samarth!
> 
> On Thu, Jan 2, 2020 at 5:51 PM Gian Merlino  wrote:
> 
> > Hey Druids,
> >
> > The Druid PMC has invited Samarth Jain (@samarthjain
> >  on GitHub) to become a committer and we
> > are pleased to announce that he has accepted. Samarth has contributed a
> > variety of improvements to Druid over the past year and has also given back
> > to the community by speaking at local meetups.
> >
> > Congratulations Samarth!
> >
> > Gian
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: [VOTE] Apache Druid graduation to top level project

2019-12-04 Thread Benedict Jin
+1

On 2019/12/04 21:33:00, Clint Wylie  wrote: 
> +1
> 
> On Wed, Dec 4, 2019 at 1:21 PM Jihoon Son  wrote:
> 
> > +1
> >
> > On Wed, Dec 4, 2019 at 1:20 PM Furkan KAMACI 
> > wrote:
> >
> > > Hi,
> > >
> > > +1!
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> > > 4 Ara 2019 ร‡ar, saat 23:58 tarihinde David Lim 
> > ลŸunu
> > > yazdฤฑ:
> > >
> > > > +1
> > > >
> > > > On Wed, Dec 4, 2019 at 1:47 PM Fangjin Yang  wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Wed, Dec 4, 2019 at 12:36 PM Julian Hyde 
> > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > >
> > > > > > > On Dec 4, 2019, at 12:19 PM, Gian Merlino 
> > wrote:
> > > > > > >
> > > > > > > Earlier this year, Druid voted to graduate to a top level
> > project.
> > > > The
> > > > > > vote
> > > > > > > passed and a resolution was submitted to the Board, but needed to
> > > be
> > > > > > > shelved at the last minute. We are now ready to proceed to
> > > graduation
> > > > > > once
> > > > > > > again, and so I would like to call another vote.
> > > > > > >
> > > > > > > For reference, our last vote thread:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/7517dd38fbeb28978b1188bed1d69e91445927d13d520a4c71ae9fcf%40%3Cdev.druid.apache.org%3E
> > > > > > >
> > > > > > > I would like to propose the resolution below. It is the same as
> > the
> > > > one
> > > > > > we
> > > > > > > voted on last time, except it adds new committers as initial PMC
> > > > > members
> > > > > > > (based on our previous decision to include any interested podling
> > > > > > > committers on the newly formed PMC).
> > > > > > >
> > > > > > > Vote:
> > > > > > > [ ] +1 - Recommend graduation of Apache Druid as a TLP
> > > > > > > [ ] -1 - Do not recommend graduation of Apache Druid because...
> > > > > > >
> > > > > > > The VOTE is open for a minimum of 72 hours.
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Establish Apache Druid Project
> > > > > > >
> > > > > > > WHEREAS, the Board of Directors deems it to be in the best
> > > > > > > interests of the Foundation and consistent with the Foundation's
> > > > > > > purpose to establish a Project Management Committee charged with
> > > > > > > the creation and maintenance of open-source analytical database
> > > > > > > software, for distribution at no charge to the public.
> > > > > > >
> > > > > > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> > > > > > > Committee (PMC), to be known as the "The Apache Druid Project",
> > > > > > > be and hereby is established pursuant to Bylaws of the
> > > > > > > Foundation; and be it further
> > > > > > >
> > > > > > > RESOLVED, that The Apache Druid Project be and hereby is
> > > > > > > responsible for the creation and maintenance of an analytical
> > > > > > > database software project; and be it further
> > > > > > >
> > > > > > > RESOLVED, that the office of "Vice President, Druid" be and
> > > > > > > hereby is created, the person holding such office to serve at the
> > > > > > > direction of the Board of Directors as the chair of The Apache
> > > > > > > Druid Project, and to have primary responsibility for
> > > > > > > management of the projects within the scope of responsibility of
> > > > > > > The Apache Druid Project; and be it further
> > > > > > >
> > > > > > > RESOLVED, that the persons listed immediately below be and
> > > > > > > hereby are appoin

Re: new committer: Vadim Ogievetsky

2019-09-24 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2019/09/25 00:32:18, Sashidhar Thallam  wrote: 
> Congratulations Vadim Ogievetsky ๐ŸŽ‰
> 
> On Wed, Sep 25, 2019 at 4:31 AM Vadim Ogievetsky 
> wrote:
> 
> > Thank you for having me!
> > I am excited to work alongside all of you to make Druid the most user
> > friendly DB on the planet and to further this community.
> >
> > On 2019/09/24 20:20:44, Jonathan Wei  wrote:
> > > The Project Management Committee (PMC) for Apache Druid
> > > has invited Vadim Ogievetsky to become a committer and we are pleased
> > > to announce that he has accepted.
> > >
> > > Being a committer enables easier contribution to the
> > > project since there is no need to go via the patch
> > > submission process. This should enable better productivity.
> > > Being a PMC member enables assistance with the management
> > > and to guide the direction of the project.
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: new committer: Furkan Kamaci

2019-09-18 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2019/09/18 07:57:37, Sashidhar Thallam  wrote: 
> Congratulations ๐Ÿ‘๐ŸŽ‰
> 
> On Wed, Sep 18, 2019, 12:57 PM Sandish Kumar HN 
> wrote:
> 
> > Congratulations
> >
> > On Wed, Sep 18, 2019 at 12:26 AM Dusan Maric  wrote:
> >
> > > Congrats! ๐Ÿ‘
> > >
> > > On Tue, Sep 17, 2019 at 10:53 PM Jonathan Wei  wrote:
> > >
> > > > The Project Management Committee (PMC) for Apache Druid
> > > > has invited Furkan Kamaci to become a committer and we are pleased
> > > > to announce that he has accepted.
> > > >
> > > > Being a committer enables easier contribution to the
> > > > project since there is no need to go via the patch
> > > > submission process. This should enable better productivity.
> > > >
> > >
> > >
> > > --
> > > Duลกan Mariฤ‡
> > > mob.: +381 64 1124779 | e-mail: thema...@gmail.com | skype: themaric
> > >
> > --
> > Sent from Gmail Mobile
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: new committer: Fokko Driesprong

2019-09-18 Thread Benedict Jin
Congratulations ๐Ÿ‘๐Ÿ‘๐Ÿ‘

On 2019/09/18 07:58:03, Sashidhar Thallam  wrote: 
> Congratulations ๐Ÿ‘๐ŸŽ‰
> 
> On Wed, Sep 18, 2019, 12:57 PM Sandish Kumar HN 
> wrote:
> 
> > Congratulations
> >
> > On Wed, Sep 18, 2019 at 12:25 AM Dusan Maric  wrote:
> >
> > > Congrats! ๐Ÿ‘
> > >
> > > On Tue, Sep 17, 2019 at 10:53 PM Jonathan Wei  wrote:
> > >
> > > > The Project Management Committee (PMC) for Apache Druid
> > > > has invited Fokko Driesprong to become a committer and we are pleased
> > > > to announce that he has accepted.
> > > >
> > > > Being a committer enables easier contribution to the
> > > > project since there is no need to go via the patch
> > > > submission process. This should enable better productivity.
> > > >
> > >
> > >
> > > --
> > > Duลกan Mariฤ‡
> > > mob.: +381 64 1124779 | e-mail: thema...@gmail.com | skype: themaric
> > >
> > --
> > Sent from Gmail Mobile
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: [VOTE] Release Apache Druid (incubating) 0.15.1 [RC2]

2019-08-04 Thread Benedict Jin



On 2019/08/01 23:38:30, Clint Wylie  wrote: 
> Hi all,
> 
> I have created a build for Apache Druid (incubating) 0.15.1, release
> candidate 2.
> 
> This is a bug fix release that includes important fixes for Apache
> Zookeeper based segment loading, the Apache DataSketches (incubating)
> extension, and much more. You can read the proposed release notes here:
> https://github.com/apache/incubator-druid/issues/8191
> 
> The release candidate has been tagged in GitHub as
> druid-0.15.1-incubating-rc2 (c698daab56a4b0612d6680b6359924653e938863),
> available here:
> 
> https://github.com/apache/incubator-druid/releases/tag/druid-0.15.1-incubating-rc2
> 
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/incubator/druid/0.15.1-incubating-rc2/
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/cwylie.asc.
> 
> This key and the key of other committers can also be found in the project's
> KEYS file here:
> https://dist.apache.org/repos/dist/release/incubator/druid/KEYS
> 
> (If you are a committer, please feel free to add your own key to that file
> by following the instructions in the file's header.)
> 
> Verify checksums:
> diff <(shasum -a512 apache-druid-0.15.1-incubating-bin.tar.gz | cut -d ' '
> -f1) <(cat apache-druid-0.15.1-incubating-bin.tar.gz.sha512 ; echo)
> 
> diff <(shasum -a512 apache-druid-0.15.1-incubating-src.tar.gz | cut -d ' '
> -f1) <(cat apache-druid-0.15.1-incubating-src.tar.gz.sha512 ; echo)
> 
> Verify signatures:
> gpg --verify apache-druid-0.15.1-incubating-bin.tar.gz.asc
> apache-druid-0.15.1-incubating-bin.tar.gz
> 
> gpg --verify apache-druid-0.15.1-incubating-src.tar.gz.asc
> apache-druid-0.15.1-incubating-src.tar.gz
> 
> Please review the proposed artifacts and vote. Note that Apache has
> specific requirements that must be met before +1 binding votes can be cast
> by PMC members. Please refer to the policy at
> http://www.apache.org/legal/release-policy.html#policy for more details.
> 
> As part of the validation process, the release artifacts can be generated
> from source by running:
> mvn clean install -Papache-release,dist
> 
> The RAT license check can be run from source by:
> mvn apache-rat:check -Prat
> 
> This vote will be open for at least 72 hours. The vote will pass if a
> majority of at least three +1 PMC votes are cast.
> 
> Once the vote has passed, the second stage vote will be called on the
> Apache Incubator mailing list to get approval from the Incubator PMC.
> 
> [ ] +1 Release this package as Apache Druid (incubating) 0.15.1
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
> 
> Thanks!
> 
> Apache Druid (incubating) is an effort undergoing incubation at The Apache
> Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is
> required of all newly accepted projects until a further review indicates
> that the infrastructure, communications, and decision making process have
> stabilized in a manner consistent with other successful ASF projects. While
> incubation status is not necessarily a reflection of the completeness or
> stability of the code, it does indicate that the project has yet to be
> fully endorsed by the ASF.
> 

+1

 - verified signature and checksum
 - checked NOTICE, LICENSE, DISCLAIMER
 - compiled source and ran tests


-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Re: [VOTE] Apache Druid graduation to top level project

2019-07-10 Thread Benedict Jin



On 2019/07/10 16:52:21, Gian Merlino  wrote: 
> Following our discussion on the dev mailing list (
> https://lists.apache.org/thread.html/68fcc3d2fc66c7f559e2c9dd02478d17f195565fecdb07ed53bc965e@%3Cdev.druid.apache.org%3E),
> I would like to call a vote for Apache Druid graduating to a top level
> project.
> 
> If this vote passes, the next step would be to submit the resolution below
> to the Incubator PMC, who would vote on sending it on to the Apache Board.
> 
> Vote:
> [ ] +1 - Recommend graduation of Apache Druid as a TLP
> [ ] -1 - Do not recommend graduation of Apache Druid because...
> 
> The VOTE is open for a minimum of 72 hours.
> 
> 
> 
> Establish Apache Druid Project
> 
> WHEREAS, the Board of Directors deems it to be in the best
> interests of the Foundation and consistent with the Foundation's
> purpose to establish a Project Management Committee charged with
> the creation and maintenance of open-source analytical database
> software, for distribution at no charge to the public.
> 
> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> Committee (PMC), to be known as the "The Apache Druid Project",
> be and hereby is established pursuant to Bylaws of the
> Foundation; and be it further
> 
> RESOLVED, that The Apache Druid Project be and hereby is
> responsible for the creation and maintenance of an analytical
> database software project; and be it further
> 
> RESOLVED, that the office of "Vice President, Druid" be and
> hereby is created, the person holding such office to serve at the
> direction of the Board of Directors as the chair of The Apache
> Druid Project, and to have primary responsibility for
> management of the projects within the scope of responsibility of
> The Apache Druid Project; and be it further
> 
> RESOLVED, that the persons listed immediately below be and
> hereby are appointed to serve as the initial members of The
> Apache Druid Project:
> 
>   * Benedict Jin (asdf2...@apache.org)
>   * Charles Allen(cral...@apache.org)
>   * Clint Wylie  (cwy...@apache.org)
>   * David Lim(david...@apache.org)
>   * Dylan Wylie  (dylanwy...@apache.org)
>   * Eric Tschetter   (ched...@apache.org)
>   * Fangjin Yang (f...@apache.org)
>   * Gian Merlino (g...@apache.org)
>   * Himanshu Gupta   (himans...@apache.org)
>   * Jihoon Son   (jihoon...@apache.org)
>   * Jonathan Wei (jon...@apache.org)
>   * Julian Hyde  (jh...@apache.org)
>   * Kurt Young   (k...@apache.org)
>   * Lijin Bin(binli...@apache.org)
>   * Maxime Beauchemin(maximebeauche...@apache.org)
>   * Niketh Sabbineni (nik...@apache.org)
>   * Nishant Bangarwa (nish...@apache.org)
>   * P. Taylor Goetz  (ptgo...@apache.org)
>   * Parag Jain   (pja...@apache.org)
>   * Roman Leventov   (leven...@apache.org)
>   * Slim Bouguerra   (bs...@apache.org)
>   * Surekha Saharan  (sure...@apache.org)
>   * Xavier Lรฉautรฉ(x...@apache.org)
> 
> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Gian Merlino
> be and hereby is appointed to the office of Vice President,
> Druid, to serve in accordance with and subject to the direction
> of the Board of Directors and the Bylaws of the Foundation until
> death, resignation, retirement, removal or disqualification, or
> until a successor is appointed; and be it further
> 
> RESOLVED, that the initial Apache Druid Project be and hereby
> is tasked with the migration and rationalization of the Apache
> Incubator Druid podling; and be it further
> 
> RESOLVED, that all responsibility pertaining to the Apache
> Incubator Druid podling encumbered upon the Apache Incubator
> PMC are hereafter discharged.
> 

+1

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org



Copyright Question

2019-05-14 Thread Benedict Jin
Hello everyone, is it convenient to ask questions about the copyright of
the pictures? I wrote an article about Apache Druid, which used a few
pictures from the Apache Druid official website. I don't know if there is a
copyright issue? If there is a copyright issue and they are not allowed to
use, I will delete these images from my blog.

FYI, https://yuzhouwan.com/posts/5845/

-- 
ๅฎ‡ๅฎ™ๆนพ๏ผˆhttps://yuzhouwan.com๏ผ‰


Optimize images by ImageBot

2019-05-06 Thread Benedict Jin
ImageBot can be enabled in a few simple steps. According to the current
test results, we can reduce the image file size of Apache Druid by 31%.

FYI, https://github.com/asdf2014/incubator-druid/pull/1

-- 
ๅฎ‡ๅฎ™ๆนพ๏ผˆhttps://yuzhouwan.com๏ผ‰


Re: Druid 0.14 timing

2019-01-07 Thread Benedict Jin



On 2019/01/04 21:06:40, Gian Merlino  wrote: 
> It feels like 0.13.0 was just recently released, but it was branched off
> back in October, and it has almost been 3 months since then. How do we feel
> about doing an 0.14 branch cut at the end of January (Thu Jan 31) - going
> back to the every 3 months cycle?
> 
> For this release, based on the feedback we got from the Incubator vote last
> time, we'll need to fix up the LICENSE and NOTICE issues that were flagged
> but waved through for our first release. (Justin said he would have -1'd
> based on that if it was anything beyond a first release.)
> +1

-
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org