Re: [DISCUSS] Dropping support for Java 8

2024-07-26 Thread Maytas Monsereenusorn
I agree with adding back Nashorn as a dependency so that users can still
use Javascript functionality. The standalone Nashorn (that we can add back
as a dependency) does not work with Java 10 or less, it only works with
Java 11 (see image below from
https://github.com/szegedi/nashorn/wiki/Using-Nashorn-with-different-Java-versions
).
[image: image.png]
 I closed my PR https://github.com/apache/druid/pull/14795 since we are
still supporting Java 8. Once we drop support for Java 8, we can switch to
the standalone Nashorn dependency. The other option is we can have
different extensions for Java 10 and below, vs Java 11 and above but that
is kinda confusing and we will drop support for Java 8 anyway.



On Thu, Jul 25, 2024 at 3:01 PM Gian Merlino  wrote:

> It would make sense to me to add back Nashorn as an optional dependency,
> i.e. moving the Javascript stuff to an extension.
>
> On Tue, Jul 16, 2024 at 5:32 PM Maytas Monsereenusorn 
> wrote:
>
> > +1 from me too.
> >
> > Also not strictly related, but Javascript tiered broker/worker selector
> > strategy and Javascript filter is broken if running Druid on Java 17.
> > The Nashorn JavaScript Engine has been removed since Java 15 and hence,
> we
> > would need to use a different script engine or add back the Nashorn
> > Javascript Engine as a dependency. (more details at
> > https://github.com/apache/druid/pull/14795)
> >
> > On Tue, Jul 16, 2024 at 2:58 PM Clint Wylie  wrote:
> >
> > > +1 from me for deprecating first then removing. I'd also like to
> > > officially support java 21 before we totally drop support, at least
> > > experimentally, but preferably fully. We already run unit tests with
> > > 21, so maybe we could transition the java 8 integration tests to use
> > > 21 instead? I've also been using 21 for all of my debugging and
> > > testing for quite some time now and it seems fine to me.
> > >
> > > I know this isn't strictly related, but with 8 still supported maybe
> > > it just seems like we are kind of slow and cautious, but if we drop 8,
> > > it seems like our java support is in a strange place of moderately old
> > > versions if we only officially support 11 and 17, given 21 is also an
> > > LTS (even 11 is starting to seem a bit old to me).
> > >
> > > On Tue, Jul 16, 2024 at 9:21 AM Gian Merlino  wrote:
> > > >
> > > > I think this is a good move. Let's give users some warning by
> > > deprecating it first prior to removal. IMO, good timing would be to
> > > deprecate Java 8 in the next major Druid release (Druid 31). That
> means a
> > > doc update, release note update, and updating the start scripts to log
> a
> > > warning that support for Java 8 will be removed soon (if Java 8 is
> > > detected).
> > > >
> > > > When we do remove support for Java 8, we should update the
> > "verify-java"
> > > script to require DRUID_SKIP_JAVA_CHECK=1 when Java 8 is detected.
> > > >
> > > > Gian
> > > >
> > > > On 2024/07/16 04:17:06 Abhishek Agarwal wrote:
> > > > > Hello everyone,
> > > > > Starting this thread to discuss, if and when, we can drop Java 8
> > > support.
> > > > > We have been fully supporting Java 11 and Java 17 for a while now.
> > > Anyone,
> > > > > who is looking to upgrade Druid, can safely select either of these
> > LTS
> > > Java
> > > > > runtimes. There are a few important reasons to drop Java 8 support
> > > > >
> > > > > - It adds extra burden on build/test pipelines to test all these
> > > different
> > > > > runtimes. We want to shrink this matrix of Java runtime and test
> > > suites.
> > > > > - Being on Java 8 will block us from upgrading dependencies that
> have
> > > > > dropped Java 8 support. We can get around it by building profiles
> and
> > > shims
> > > > > but it adds more complexity. One example is pac4j which is Java 11
> > > based
> > > > > from 5.x.
> > > > > - As we drop support for older Java releases, developers can use
> the
> > > > > features offered by the more advanced Java versions.
> > > > >
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > > For additional commands, e-mail: dev-h...@druid.apache.org
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > > For additional commands, e-mail: dev-h...@druid.apache.org
> > >
> > >
> >
>


Re: [DISCUSS] Dropping support for Java 8

2024-07-16 Thread Maytas Monsereenusorn
+1 from me too.

Also not strictly related, but Javascript tiered broker/worker selector
strategy and Javascript filter is broken if running Druid on Java 17.
The Nashorn JavaScript Engine has been removed since Java 15 and hence, we
would need to use a different script engine or add back the Nashorn
Javascript Engine as a dependency. (more details at
https://github.com/apache/druid/pull/14795)

On Tue, Jul 16, 2024 at 2:58 PM Clint Wylie  wrote:

> +1 from me for deprecating first then removing. I'd also like to
> officially support java 21 before we totally drop support, at least
> experimentally, but preferably fully. We already run unit tests with
> 21, so maybe we could transition the java 8 integration tests to use
> 21 instead? I've also been using 21 for all of my debugging and
> testing for quite some time now and it seems fine to me.
>
> I know this isn't strictly related, but with 8 still supported maybe
> it just seems like we are kind of slow and cautious, but if we drop 8,
> it seems like our java support is in a strange place of moderately old
> versions if we only officially support 11 and 17, given 21 is also an
> LTS (even 11 is starting to seem a bit old to me).
>
> On Tue, Jul 16, 2024 at 9:21 AM Gian Merlino  wrote:
> >
> > I think this is a good move. Let's give users some warning by
> deprecating it first prior to removal. IMO, good timing would be to
> deprecate Java 8 in the next major Druid release (Druid 31). That means a
> doc update, release note update, and updating the start scripts to log a
> warning that support for Java 8 will be removed soon (if Java 8 is
> detected).
> >
> > When we do remove support for Java 8, we should update the "verify-java"
> script to require DRUID_SKIP_JAVA_CHECK=1 when Java 8 is detected.
> >
> > Gian
> >
> > On 2024/07/16 04:17:06 Abhishek Agarwal wrote:
> > > Hello everyone,
> > > Starting this thread to discuss, if and when, we can drop Java 8
> support.
> > > We have been fully supporting Java 11 and Java 17 for a while now.
> Anyone,
> > > who is looking to upgrade Druid, can safely select either of these LTS
> Java
> > > runtimes. There are a few important reasons to drop Java 8 support
> > >
> > > - It adds extra burden on build/test pipelines to test all these
> different
> > > runtimes. We want to shrink this matrix of Java runtime and test
> suites.
> > > - Being on Java 8 will block us from upgrading dependencies that have
> > > dropped Java 8 support. We can get around it by building profiles and
> shims
> > > but it adds more complexity. One example is pac4j which is Java 11
> based
> > > from 5.x.
> > > - As we drop support for older Java releases, developers can use the
> > > features offered by the more advanced Java versions.
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
>
>


Re: Spark Druid connectors, take 2

2023-10-26 Thread Maytas Monsereenusorn
I am still in favor and support getting Spark Druid connector working and merge 
to master.
I am also still planning on taking this up but haven’t had time yet. If anyone 
else is interested and wants to pick this up, that would be great too!

Thanks,
Maytas
Sent from my iPhone

> On Oct 25, 2023, at 6:25 AM, Will Xu <2beth...@gmail.com> wrote:
> 
> Hi,
> I want to revive this thread a bit. How's people's latest feeling about
> getting Spark working? I want to see if we can coordinate a proposal
> together.
> Regards,
> Will
> 
>> On Thu, Aug 10, 2023 at 3:05 AM Maytas Monsereenusorn 
>> wrote:
>> 
>> Hi all,
>> 
>> First of all, thank you Julian for bringing this up and starting the
>> conversation.
>> Just to chime in on our (Netflix) use cases.
>> We use Spark 3.3 and would benefit from both reader and writer.
>> For the writer, we currently have a Spark job that writes data (from a
>> Spark job) to an intermediate Iceberg table. We would then separately issue
>> a Druid batch ingestion to consume from this intermediate Iceberg table (by
>> passing the S3 paths of the table). Having write support from within Spark
>> job (to Druid) would help us eliminate this intermediate Iceberg table and
>> simplify our workflow (possibly also reducing our storage and compute
>> cost). To answer your question, I think this will be more aligned
>> with having a spark job targeting a druid cluster.
>> For the reader, we would like to be able to export data from Druid (such as
>> moving Druid data into an Iceberg table) and also joining/further
>> processing of Druid data with other (non-Druid) data (such as other Iceberg
>> tables) within Spark jobs. To answer your question, I think this will be
>> more aligned with the reader in Spark job reading Druid segment files
>> directly.
>> 
>> Thanks,
>> Maytas
>> 
>> 
>> 
>> On Wed, Aug 9, 2023 at 2:14 PM Rajiv Mordani 
>> wrote:
>> 
>>> Will, Julian,
>>>See responses below tagged with [Rajiv] in blue:
>>> 
>>> From: Will Xu 
>>> Date: Tuesday, August 8, 2023 at 9:27 AM
>>> To: dev@druid.apache.org 
>>> Subject: Re: Spark Druid connectors, take 2
>>> !! External Email
>>> 
>>> For which version to target, I think we should survey the Druid community
>>> and get input. In your case, which version are you currently deploying?
>>> Historical experience tells me we should target current and current-1
>>> (3.4.x and 3.3.x)
>>> 
>>> 
>>> [Rajiv] Version should be fine at least for our use cases.
>>> 
>>> 
>>> In terms of the writer (Spark writes to Druid), what's the user workflow
>>> you envision? Would you think the user would trigger a spark job from
>>> Druid? Or is this user who is submitting a Spark job to target a Druid
>>> cluster? The former allows other systems, like compaction, for example,
>> to
>>> use Spark as a runner.
>>> 
>>> 
>>> [Rajiv] For us it is the latter. Where a spark job targets a druid
>> cluster.
>>> 
>>> 
>>> In terms of the reader (Spark reads Druid). I'm most curious to find out
>>> what experience you are imagining. Should the reader be reading Druid
>>> segment files or would the reader issue queries to Druid (maybe even to
>>> historicals?) so that query can be parallelized?
>>> 
>>> 
>>> [Rajiv] Segments is going to be tricky specially with things like
>>> compaction etc. I think we definitely need to be able to query hot cache
>> as
>>> well. So not just segments / historicals.
>>> 
>>> 
>>> Of the two, there is a lot more interest in the writer from the people
>> I've
>>> been talking to.
>>> 
>>> 
>>> [Rajiv] We need both read and write for the different kinds of jobs.
>>> 
>>> Responses to Julian’s asks in-line below:
>>> 
>>> Regards,
>>> Will
>>> 
>>> 
>>> On Tue, Aug 8, 2023 at 8:50 AM Julian Jaffe 
>>> wrote:
>>> 
>>>> Hey all,
>>>> 
>>>> There was talk earlier this year about resurrecting the effort to add
>>>> direct Spark readers and writers to Druid. Rather than repeat the
>>> previous
>>>> attempt and parachute in with updated connectors, I’d like to start by
>>>> building a little more consensus around what the Druid dev community
>>> wants
>>>> as potential maintai

Re: Spark Druid connectors, take 2

2023-08-09 Thread Maytas Monsereenusorn
Hi all,

First of all, thank you Julian for bringing this up and starting the
conversation.
Just to chime in on our (Netflix) use cases.
We use Spark 3.3 and would benefit from both reader and writer.
For the writer, we currently have a Spark job that writes data (from a
Spark job) to an intermediate Iceberg table. We would then separately issue
a Druid batch ingestion to consume from this intermediate Iceberg table (by
passing the S3 paths of the table). Having write support from within Spark
job (to Druid) would help us eliminate this intermediate Iceberg table and
simplify our workflow (possibly also reducing our storage and compute
cost). To answer your question, I think this will be more aligned
with having a spark job targeting a druid cluster.
For the reader, we would like to be able to export data from Druid (such as
moving Druid data into an Iceberg table) and also joining/further
processing of Druid data with other (non-Druid) data (such as other Iceberg
tables) within Spark jobs. To answer your question, I think this will be
more aligned with the reader in Spark job reading Druid segment files
directly.

Thanks,
Maytas



On Wed, Aug 9, 2023 at 2:14 PM Rajiv Mordani 
wrote:

> Will, Julian,
> See responses below tagged with [Rajiv] in blue:
>
> From: Will Xu 
> Date: Tuesday, August 8, 2023 at 9:27 AM
> To: dev@druid.apache.org 
> Subject: Re: Spark Druid connectors, take 2
> !! External Email
>
> For which version to target, I think we should survey the Druid community
> and get input. In your case, which version are you currently deploying?
> Historical experience tells me we should target current and current-1
> (3.4.x and 3.3.x)
>
>
> [Rajiv] Version should be fine at least for our use cases.
>
>
> In terms of the writer (Spark writes to Druid), what's the user workflow
> you envision? Would you think the user would trigger a spark job from
> Druid? Or is this user who is submitting a Spark job to target a Druid
> cluster? The former allows other systems, like compaction, for example, to
> use Spark as a runner.
>
>
> [Rajiv] For us it is the latter. Where a spark job targets a druid cluster.
>
>
> In terms of the reader (Spark reads Druid). I'm most curious to find out
> what experience you are imagining. Should the reader be reading Druid
> segment files or would the reader issue queries to Druid (maybe even to
> historicals?) so that query can be parallelized?
>
>
> [Rajiv] Segments is going to be tricky specially with things like
> compaction etc. I think we definitely need to be able to query hot cache as
> well. So not just segments / historicals.
>
>
> Of the two, there is a lot more interest in the writer from the people I've
> been talking to.
>
>
> [Rajiv] We need both read and write for the different kinds of jobs.
>
> Responses to Julian’s asks in-line below:
>
> Regards,
> Will
>
>
> On Tue, Aug 8, 2023 at 8:50 AM Julian Jaffe 
> wrote:
>
> > Hey all,
> >
> > There was talk earlier this year about resurrecting the effort to add
> > direct Spark readers and writers to Druid. Rather than repeat the
> previous
> > attempt and parachute in with updated connectors, I’d like to start by
> > building a little more consensus around what the Druid dev community
> wants
> > as potential maintainers.
> >
> > To begin with, I want to solicit opinions on two topics:
> >
> > Should these connectors be written in Scala or Java? The benefits of
> Scala
> > would be that the existing connectors are written in Scala, as are most
> > open source references for Spark Datasource V2 implementations. The
> > benefits of Java are that Druid is written in Java, and so engineers
> > interested in contributing to Druid wouldn’t need to switch between
> > languages. Additionally, existing tooling, static checkers, etc. could be
> > used with minimal effort, conforming code style and developer ergonomics
> > across Druid instead of needing to keep an alternate Scala tool chain in
> > sync.
>
> [Rajiv] We need Java support.
>
>
> > Which Spark version should this effort target? The most recently released
> > version of Spark is 3.4.1. Should we aim to integrate with the latest
> Spark
> > minor version under the assumption that this will give us the longest
> > window of support, or should we build against an older minor line (3.3?
> > 3.2?) since most Spark users tend to lag? For reference, there are
> > currently 3 stable Spark release versions, 3.2.4, 3.3.2, and 3.4.1. From
> a
> > user’s point of view, the API is mostly compatible across a major version
> > (i.e. 3.x), while developer APIs such as the ones we would use to build
> > these connectors can change between minor versions.
> > There are quite a few nuances and trade offs inherent to the decisions
> > above, and my hope is that by hashing these choices out before presenting
> > an implementation we can build buy-in from the Druid maintainer community
> > that will result in this effort succeeding where the first effort failed.
>

New Committer : Jason Koch

2023-04-17 Thread Maytas Monsereenusorn
Hello everyone,

The Project Management Committee (PMC) for Apache Druid has invited
Jason to become a committer and we are pleased to announce that
Jason has accepted.

Jason has been working on Apache Druid for about two years now. Jason's
focus has been in the area of performance and cost saving. His most notable
contribution is performance improvement of Druid ingestion, both batch
ingestion and streaming ingestion. Jason also introduced a new "tree" type
to the flattenSpec allowing faster JSON parsing for certain use cases.

Congratulations Jason.


Re: CI requiring approval for external contributors

2023-03-29 Thread Maytas Monsereenusorn
I think there is also an option of "Only requires approval first time".
This is what Apache Iceberg has been using before ASF GitHub repos had
their defaults for GitHub Actions changed and is what they are most likely
going back to.



On Wed, Mar 29, 2023 at 6:43 AM Karan Kumar 
wrote:

> +1
> Pasting some of the examples that I shared on slack:
>
>- PR : https://github.com/apache/druid/pull/13934 which is raised by
>Soumyava Das who contributes regularly to druid should not block on a
>committer to approve CI runs.
>- PR : https://github.com/apache/druid/pull/13909 by Adarsh Sanjeev who
>has added a lot of features in MSQE.
>- PR : https://github.com/apache/druid/pull/13991 by Jason Witkowski
> who
>contributes regularly to druid helm charts.
>
>
> On Wed, Mar 29, 2023 at 10:55 AM Frank Chen  wrote:
>
> > +1
> >
> > I don't see there's a need for committers to click the 'Approve' button
> to
> > run CI for every PR.
> >
> >
> >
> > On Tue, Mar 28, 2023 at 2:27 PM Austin Bennett 
> wrote:
> >
> > > Beam did the same.
> > >
> > > +1 on allowing contributors to keep things moving [where appropriate,
> > > which this can be if CI is setup well], allowing committers to focus on
> > > more/other activities
> > >
> > > On Mon, Mar 27, 2023, 11:24 PM Gian Merlino  wrote:
> > >
> > > > Recently, ASF GitHub repos had their defaults for GitHub Actions
> > changed
> > > to
> > > > "always require approval for external contributors". In Slack, Karan
> > > > pointed out that Airflow has recently submitted a ticket to have that
> > > > changed back: https://issues.apache.org/jira/browse/INFRA-24200.
> IMO,
> > we
> > > > should do the same. I don't think we have a problem with fake PRs,
> but
> > we
> > > > can always improve our responsiveness to contributors from outside
> the
> > > > project! Every little bit helps, including running CI automatically.
> > > >
> > > > If others have opinions on this, let me know. I'd like to raise our
> own
> > > > ticket to change our default.
> > > >
> > > > Gian
> > > >
> > >
> >
>
>
> --
> Thanks
> Karan
>


Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-22 Thread Maytas Monsereenusorn
Hi Julian,

Thank you so much for your contribution on Spark support. As an existing
committer, I would like to help get the Spark connector merged into OSS
(including PR reviews and any other development work that may be needed).
We can move the conversation regarding Spark support into a new thread or
reuse the Github issue already opened to keep this thread on topic with
dropping support for Hadoop 2.x.

Best Regards,
Maytas

On Sun, Aug 21, 2022 at 11:55 PM Julian Jaffe 
wrote:

> For Spark support, the connector I wrote remains functional but I haven’t
> updated the PR for six months or so since it didn’t seem like there was an
> appetite for review. If that’s changing I could migrate back some more
> recent changes to the OSS PR. Even with an up-to-date patch though I see
> two problems:
>
> First, I remain worried that there isn’t sufficient support among
> committers for the Spark connector. I don’t want Druid to end up in the
> same place it is now for Hadoop 2 support where no one really maintains the
> Spark code and we wind up with another awkward corner of the code base that
> holds back other development.
>
> Secondly, the PR I have up is for Spark 2.4, which is now 2 years further
> out of date than it was back in 2020. Similarly to Hadoop there is a
> bifurcation in the community and Spark 2.4 is still in heavy use but we
> might be trading one problem for another if we deprecate Hadoop 2 in favor
> of Spark 2.4. I have written a Spark 3.2 connector as well but it’s been
> deployed to significantly smaller use cases than the 2.4 line.
>
> Even with these two caveats, if there’s a desire among the Druid
> development community to add Spark functionality and support it I’d love to
> push this across the finish line.
>
> > On Aug 9, 2022, at 1:04 AM, Abhishek Agarwal 
> wrote:
> >
> > Yes. We should deprecate it first which is similar to dropping the
> support
> > (no more active development) but we will still ship it for a release or
> > two. In a way, we are already in that mode to a certain extent. Many
> > features are being built with native ingestion as a first-class citizen.
> > E.g. range partitioning is still not supported on Hadoop ingestion. It's
> > hard for developers to build and test their business logic for all the
> > ingestion modes.
> >
> > It will be good to hear what gaps do community sees between native
> > ingestion vs Hadoop-based batch ingestion. And then work toward fixing
> > those gaps before dropping the Hadoop ingestion entirely. For example, if
> > users want the resource elasticity that a Hadoop cluster gives, we could
> > push forward PRs such as https://github.com/apache/druid/pull/10910.
> It's
> > not the same as a Hadoop cluster but nonetheless will let user reuse
> their
> > existing infrastructure to run druid jobs.
> >
> >> On Tue, Aug 9, 2022 at 9:43 AM Gian Merlino  wrote:
> >>
> >> It's always good to deprecate things for some time prior to removing
> them,
> >> so we don't need to (nor should we) remove Hadoop 2 support right now.
> My
> >> vote is that in this upcoming release, we should deprecate it. The main
> >> problem in my eyes is the one Abhishek brought up: the dependency
> >> management situation with Hadoop 2 is really messy, and I'm not sure
> >> there's a good way to handle them given the limited classloader
> isolation.
> >> This situation becomes tougher to manage with each release, and we
> haven't
> >> had people volunteering to find and build comprehensive solutions. It is
> >> time to move on.
> >>
> >> The concern Samarth raised, that people may end up stuck on older Druid
> >> versions because they aren't able to upgrade to Hadoop 3, is valid. I
> can
> >> see two good solutions to this. First: we can improve native ingest to
> the
> >> point where people feel broadly comfortable moving Hadoop 2 workloads to
> >> native. The work planned as part of doing ingest via multi-stage
> >> distributed query  is
> going
> >> to be useful here, by improving the speed and scalability of native
> ingest.
> >> Second: it would also be great to have something similar that runs on
> >> Spark, for people that have made investments in Spark. I suspect that
> most
> >> people that used Hadoop 2 have moved on to Hadoop 3 or Spark, so
> supporting
> >> both of those would ease a lot of the potential pain of dropping Hadoop
> 2
> >> support.
> >>
> >> On Spark: I'm not familiar with the current state of the Spark work. Is
> it
> >> stuck? If so could something be done to unstick it? I agree with
> Abhishek
> >> that I wouldn't want to block moving off Hadoop 2 on this. However,
> it'd be
> >> great if we could get it done before actually removing Hadoop 2 support
> >> from the code base.
> >>
> >>
> >> On Wed, Aug 3, 2022 at 6:17 AM Abhishek Agarwal <
> abhishek.agar...@imply.io
> >>>
> >> wrote:
> >>
> >>> I was thinking that moving from Hadoop 2 to Hadoop 3 will be a
> >>> low-resistance path than movi

Re: PR issues

2022-03-22 Thread Maytas Monsereenusorn
Hi Shani,

Looks like all the tests passed. I have approved the PR and merged it into
master.
Thank you for your contribution!!

Best Regards,
Maytas

On Tue, Mar 22, 2022 at 7:11 AM Shani Yacobovitz <
shani.yacobov...@forescout.com> wrote:

> Hello,
> We have created a PR lately that is related to the Prometheus emitter:
> https://github.com/apache/druid/pull/12296
>
> The PR was approved but its build failed and it seems to me that the
> reason is some flaky tests.
> We would be happy if you could look and explain to us the reason that
> caused the failure.
>
> We would be happy to continue the progress of this change.
>
> Thank you in advance,
> Shani Yacobovitz
> WARNING - CONFIDENTIAL INFORMATION:
> 
> This message may contain confidential and privileged information. If it
> has been sent to you in error, please reply to advise the sender of the
> error and then immediately delete it. If you are not the intended
> recipient, do not read, copy, disclose or otherwise use this message. The
> sender disclaims any liability for such unauthorized use. NOTE that all
> incoming emails sent to Forescout email accounts will be archived and may
> be scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional emails ("spam"). If you have any
> concerns about this process, please contact us priv...@forescout.com.
>


Re: [VOTE] Release Apache Druid 0.22.0 [RC1]

2021-09-20 Thread Maytas Monsereenusorn
+1 (binding)

src
- verified the signature and checksum
- LICENSE and NOTICE present
- compiled and ran the licenses.yaml file check
- compiled and ran s3 deep storage integration tests
- ran RAT check
- built binary, ingested some data using batch and ran some queries

bin
- verified the signature and checksum
- LICENSE and NOTICE present
- ingested some data using batch ingestion and ran some queries

On Tue, Sep 21, 2021 at 10:41 AM Jihoon Son  wrote:

> +1 (binding)
>
> src
> - verified the signature and checksum
> - LICENSE and NOTICE present
> - compiled and ran the licenses.yaml file check
> - ran RAT check
> - built binary, ingested some data using batch and kafka ingestion, and
> ran some queries
>
> bin
> - verified the signature and checksum
> - LICENSE and NOTICE present
> - ingested some data using batch and kafka ingestion and ran some queries
>
> docker
> - verified checksum
> - ingested some data using batch ingestion and ran some queries
>
> On Mon, Sep 20, 2021 at 8:12 PM Suneet Saldanha
>  wrote:
> >
> > +1 (binding)
> >
> > *Source code*
> >  - NOTICE and LICENSE files present
> >  - verified signatures and checksums
> >  - git.version file is present and correct
> >  - maven build passes locally
> >  - tests are green on travis for the latest commit in the release branch
> > https://app.travis-ci.com/github/apache/druid/builds/237760487
> >  - mvn rat checks passed
> >  - ran the batch ingestion quickstart with some simple queries
> >
> > *Binary*
> >  - NOTICE and LICENSE files present
> >  - Verified signatures and checksums
> >  - ran quickstart and did wikipedia ingest + sample queries
> >
> > On Sat, Sep 18, 2021 at 3:34 AM frank chen  wrote:
> >
> > > +1
> > >
> > > *Source code*
> > >
> > >- NOTICE and LICENSE files present
> > >- verified signatures and checksums
> > >- git.version file is present and correct
> > >- mvn build passed with unit tests
> > >- mvn rat checks passed
> > >- ran cluster with built binaries and ran native batch ingestion
> > >followed by basic queries
> > >
> > >
> > > *Binary*
> > >
> > >- NOTICE and LICENSE files present
> > >- Verified signatures and checksums
> > >- ran cluster with pre-packaged binaries and ran both native and
> kafka
> > >ingestion followed by basic queries
> > >
> > >
> > > *Docker*
> > >
> > >- verified checksum
> > >- started docker by docker-compose.xml
> > >- ran native ingestion and kafka ingestion followed by basic queries
> > >
> > >
> > > Thanks.
> > >
> > > On Thu, Sep 16, 2021 at 3:32 PM Clint Wylie  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have created a build for Apache Druid 0.22.0, release
> > > > candidate 1.
> > > >
> > > > Thanks to everyone who has helped contribute to the release! You can
> read
> > > > the proposed release notes here:
> > > > https://github.com/apache/druid/issues/11657
> > > >
> > > > The release candidate has been tagged in GitHub as
> > > > druid-0.22.0-rc1 (cc603d6118cf7f14056744d608b93d4d8fd4e710),
> > > > available here:
> > > > https://github.com/apache/druid/releases/tag/druid-0.22.0-rc1
> > > >
> > > > The artifacts to be voted on are located here:
> > > > https://dist.apache.org/repos/dist/dev/druid/0.22.0-rc1/
> > > >
> > > > A staged Maven repository is available for review at:
> > > >
> https://repository.apache.org/content/repositories/orgapachedruid-1026/
> > > >
> > > > Staged druid.apache.org website documentation is available here:
> > > > https://druid.staged.apache.org/docs/0.22.0/design/index.html
> > > >
> > > > A Docker image containing the binary of the release candidate can be
> > > > retrieved via:
> > > > docker pull apache/druid:0.22.0-rc1
> > > >
> > > > artifact checksums
> > > > src:
> > > >
> > > >
> > >
> 9073e4e4f1dedcddc7c644f11480ed5354093d3aff26986780e871b20b1bcfd64f9aafc7cc9e82aeb169c6534274aa3fa3f8710d53f3dd9c704ec32de7b2141d
> > > > bin:
> > > >
> > > >
> > >
> 6a2d191cda37e712a39e59066259f028fadf407a6b3f7a746908cd3c4ed10e6d43008778a045eefc551dbbc3e47f8c1cdbfaab44845fbc217d22029bc3b3c3de
> > > > docker:
> 626fd96a997361dce8452c68b28e935a2453153f0d743cf208a0b4355a4fc2c3
> > > >
> > > > Release artifacts are signed with the following key:
> > > > https://people.apache.org/keys/committer/cwylie.asc
> > > >
> > > > This key and the key of other committers can also be found in the
> > > project's
> > > > KEYS file here:
> > > > https://dist.apache.org/repos/dist/release/druid/KEYS
> > > >
> > > > (If you are a committer, please feel free to add your own key to that
> > > file
> > > > by following the instructions in the file's header.)
> > > >
> > > >
> > > > Verify checksums:
> > > > diff <(shasum -a512 apache-druid-0.22.0-src.tar.gz | \
> > > > cut -d ' ' -f1) \
> > > > <(cat apache-druid-0.22.0-src.tar.gz.sha512 ; echo)
> > > >
> > > > diff <(shasum -a512 apache-druid-0.22.0-bin.tar.gz | \
> > > > cut -d ' ' -f1) \
> > > > <(cat apache-druid-0.22.0-bin.tar.gz.sha512 ; echo)
> > > >
> > > > Verif

Re: Enabling dependabot in our github repository

2021-04-06 Thread Maytas Monsereenusorn
I remember seeing someone asked about Dependabot in asfinfra slack channel
a few weeks ago. However, asfinfra said they cannot allow it.
Here is the link:
https://the-asf.slack.com/archives/CBX4TSBQ8/p1616539376210800
I think this is the same as Github's dependabot.

Best Regards,
Maytas


On Tue, Apr 6, 2021 at 2:37 PM Xavier Léauté  wrote:

> Hi folks, as you know Druid has a lot of dependencies, and keeping up with
> the latest versions of everything, whether it relates to fixing CVEs or
> other improvements is a lot of manual work.
>
> I suggest we enable Github's dependabot in our repository to keep our
> dependencies up to date. The bot is also helpful in providing a short
> commit log summary to understand changes.
> This might yield a flurry of PRs initially, but we can configure it to
> exclude libraries or version ranges that we know are unsafe for us to
> upgrade to.
>
> It looks like some other ASF repos have this enabled already (see
> https://github.com/apache/commons-imaging/pull/126), so hopefully this
> only
> requires filing an INFRA ticket.
>
> Happy to take care of it if folks are on board.
>
> Thanks!
> Xavier
>


Re: Forbidding forced git push

2021-01-15 Thread Maytas Monsereenusorn
We have a Git Hook script included in Druid repo (
https://github.com/apache/druid/blob/master/dev/intellij-setup.md#git-checkstyle-verification-hook-optional
).
Maybe this is something we can add to the Git hook (to enforce / prevent
force push) and encourage more people to install the hook.
This way, you only have to take action once (to install the hook) instead
of remembering everytime you push your commits.

Best Regards,
Maytas

On Fri, Jan 15, 2021 at 2:19 PM Jihoon Son  wrote:

> @Clint thanks, I didn't know Github supports that now. I think it will make
> force pushes a bit better only when Github can restore the commit history
> somehow, such as when there are only new commits added without overwriting
> or deleting any previous commits.
>
> @Himanshu, thanks. That makes sense. Regarding forbidding force push for
> the master branch, I do agree that is the case we must not do force pushes.
> I can create an infra ticket for that, but I'm unsure how much gain we can
> get since we never push to the master directly.
> I would rather wait until we meet some cases where we have to update the
> master directly.
>
> I opened a PR for promoting the contributing doc and discouraging people
> from using force pushes (https://github.com/apache/druid/pull/10769).
>
> Thanks,
> Jihoon
>
> On Fri, Jan 15, 2021 at 2:06 PM Himanshu  wrote:
>
> > +1 for discouraging force push in the PRs since there is no way to
> enforce
> > it.
> >
> > > clean commit history is not a big gain compared to how much it can make
> > the review process worse, especially when the PR is big
> > "commit history" of PR is destroyed anyways when we do "Squash and Merge"
> > ... commit message in druid master is based on PR "title" . so
> > maintaining nice/clean looking commit history in PR isn't important.
> >
> > We MUST certainly not do a force push in apache/druid:master, that
> > could/should be enforced I think . Force push to release branches are
> > tolerable in extreme circumstances like Clint described.
> >
> > I hope that makes sense.
> >
> > -- Himanshu
> >
> >
> >
> >
> > On Fri, Jan 15, 2021 at 1:47 PM Clint Wylie  wrote:
> >
> > > It seems like this will basically only affect release managers.
> > >
> > > I am maybe -1 since I have personally had to force push to a release
> > branch
> > > while making an RC, when I optimistically pushed the tags and then
> found
> > a
> > > mistake doing preflight checks before sending the artifacts out to
> vote.
> > I
> > > did this so that I didn't have to do something like jump from RC1 to
> RC3
> > > with a dead RC2 before it was even voted on.
> > >
> > > I find since github added
> > > https://github.blog/changelog/2018-11-15-force-push-timeline-event/
> that
> > > force-pushes aren't that terrible to deal with during a review even, so
> > > would probably personally be in favor of relaxing our soft policy on
> > them,
> > > but it seems like everyone else is opposed to them, so i think it is
> also
> > > fine to keep the soft policy as is and adding the link to it to the PR
> > > template.
> > >
> > > On Fri, Jan 15, 2021 at 1:26 PM Gian Merlino  wrote:
> > >
> > > > Will this help for the (common) case where PR branches are in
> people's
> > > > forks?
> > > >
> > > > On Fri, Jan 15, 2021 at 1:00 PM Jihoon Son 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > The forced git push is usually used to make the commit history
> clean,
> > > > which
> > > > > I understand its importance. However, one of its downsides is,
> > because
> > > it
> > > > > overwrites the commit history, we cannot tell the exact change
> > between
> > > > > commits while reviewing a PR. This increases the burden for
> reviewers
> > > > > because they have to go through the entire PR again after a forced
> > > push.
> > > > > For the same reason, we are suggesting to not use it in our
> > > > documentation (
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master
> > > > > ),
> > > > > but I don't believe this documentation is well read by many people
> > (It
> > > > is a
> > > > > good doc, BTW. Maybe we should promote it more effectively).
> > > > >
> > > > > Since branch sharing doesn't usually happen for us (AFAIK, there
> has
> > > been
> > > > > no branch sharing so far), I think this is the biggest downside of
> > > using
> > > > > forced push. To me, clean commit history is not a big gain compared
> > to
> > > > how
> > > > > much it can make the review process worse, especially when the PR
> is
> > > big.
> > > > >
> > > > > So, I would like to suggest forbidding git forced push for the
> Druid
> > > > > repository. It seems possible to disable it by creating an infra
> > > ticket (
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/INFRA-13613?jql=text%20~%20%22force%20push%22
> > > > > ).
> > > > > I can do it if everyone agrees.
> > > > >
> > > > > Would like to hear

Re: Forbidding forced git push

2021-01-15 Thread Maytas Monsereenusorn
Thank you for the proposal, Jihoon.
I am +1 on this and agree with all your points.

Best Regards,
Maytas

On Fri, Jan 15, 2021 at 1:05 PM Lucas Capistrant 
wrote:

> +1 from me. I completely agree that it creates pain for reviewers. I think
> it’s important to keep reviewing as frictionless as possible to help
> maintain community involvement
>
> On Fri, Jan 15, 2021 at 3:00 PM Jihoon Son  wrote:
>
> > Hi all,
> >
> > The forced git push is usually used to make the commit history clean,
> which
> > I understand its importance. However, one of its downsides is, because it
> > overwrites the commit history, we cannot tell the exact change between
> > commits while reviewing a PR. This increases the burden for reviewers
> > because they have to go through the entire PR again after a forced push.
> > For the same reason, we are suggesting to not use it in our
> documentation (
> >
> >
> https://github.com/apache/druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master
> > ),
> > but I don't believe this documentation is well read by many people (It
> is a
> > good doc, BTW. Maybe we should promote it more effectively).
> >
> > Since branch sharing doesn't usually happen for us (AFAIK, there has been
> > no branch sharing so far), I think this is the biggest downside of using
> > forced push. To me, clean commit history is not a big gain compared to
> how
> > much it can make the review process worse, especially when the PR is big.
> >
> > So, I would like to suggest forbidding git forced push for the Druid
> > repository. It seems possible to disable it by creating an infra ticket (
> >
> >
> https://issues.apache.org/jira/browse/INFRA-13613?jql=text%20~%20%22force%20push%22
> > ).
> > I can do it if everyone agrees.
> >
> > Would like to hear what people think.
> > Jihoon
> >
>


Re: Pull #9224 - Druid Coordinator Pause Feature

2020-01-20 Thread Maytas Monsereenusorn
I'm still pretty new to Druid and might be wrong but I notice the following
points in the documentation for the Coordinator (
https://druid.apache.org/docs/latest/design/coordinator.html):

*"The Druid Coordinator runs periodically and the time between each run is
a configurable parameter. Each time the Druid Coordinator runs, it assesses
the current state of the cluster before deciding on the appropriate actions
to take."*
Is it possible to use this configuration and set to a really large number
to do what you wanted?

"
*If the Druid Coordinator is not started up, no new segments will be loaded
in the cluster and outdated segments will not be dropped. However, the
Coordinator process can be started up at any time, and after a configurable
delay, will start running Coordinator tasks. This also means that if you
have a working cluster and all of your Coordinators die, the cluster will
continue to function, it just won’t experience any changes to its data
topology."*From this, it seems like the Coordinator does not to be running
both when other processes is starting up and if they are already up.

Best Regards,
Maytas

On Mon, Jan 20, 2020 at 2:10 PM Will Lauer 
wrote:

> I have no idea about the implementation, but the concept is certainly one
> we have been looking for for quite a while in the several clusters I
> manage. I'm excited to see this capability added to the system.
>
> Will
>
> On Mon, Jan 20, 2020, 1:55 PM Lucas Capistrant  >
> wrote:
>
> > Hi all,
> >
> > Looking for some feedback on the idea of creating a new dynamic config
> for
> > the coordinator that allows cluster admins to pause coordination by
> setting
> > the new config to true (default is false). By pause coordination, I mean
> to
> > skip running any coordinator helpers every time the coordinator runs.
> Some
> > more details are included below as well as a link to a PR with the
> initial
> > implementation that I came up with. Any feedback helps, we want to make
> > sure we are not overlooking any negative side effects!
> >
> > My organization is preparing to undergo some heavy maintenance on our
> HDFS
> > cluster that backs our production Druid clusters. This involves HDFS
> > downtime. Our plan was to stop the coordinators and overlords and rolling
> > restart the Historical nodes during the outage to lay down the new site
> > files and retain a static picture of the world for client queries to run
> > against. During our tests in stage we realized the Historical's check in
> > with the coordinator when starting up. Therefore, we wanted to find a way
> > to leave the coordinator up, but not actually coordinate segments on the
> > cluster, try run kill tasks, etc. (because HDFS is offline and we don't
> > want to be talking with it until we know it is back up and healthy).
> Thus,
> > Pull
> > 9224  was born. This
> > seemed like an easy and effective way to halt coordination and keep the
> API
> > up.
> >
> > We've done some small scale testing in a dev environment and I am
> currently
> > looking into writing some time of integration test that flexes this code
> > path. Despite the changes perceived simplicity, it would be nice to have
> > something there.
> >
> > Thanks!
> > Lucas Capistrant
> >
>