Re: Cluster for Performance Testing Hive

2024-05-22 Thread Stamatis Zampetakis
Hive is open source so there are millions of ways to help even without
being under the sponsorship flag.

Quick examples:
"Hey, I have set up this machine there and I can give you access to
run benchmarks"
"In my company X we are using Hive and we decided to run nightly
benchmarks on our inhouse clusters; the results are publicly available
and you can access them here."

In a nutshell we are open to any kind of help that someone is willing to offer.

The official website of Hive [1] has all the necessary links about
donations, sponsorships, etc., under the ASF dropdown menu.

Best,
Stamatis

[1] https://hive.apache.org/


On Wed, May 22, 2024 at 10:38 AM Eugene Ryan  wrote:
>
> Thanks for that, Stamatis. Plenty of food for thought there. What would you 
> think of the best way of getting sponsors on board - when they 
> read/contribute here, for example?
>
> From the list of requirements to start a VM, the following could be used as 
> part of the process, I imagine:
> Maintainers:
> "Provide the name, Apache ID, and contact info for at least three PMC 
> members who will maintain the vm " - read “maintain cluster” here or perhaps 
> this would be the sponsor
>
> On Tue, May 21, 2024 at 1:36 PM Stamatis Zampetakis  wrote:
>>
>> Hey Eugene,
>>
>> Having a cluster for performance testing is a great idea and it is
>> something that has popped up in various contexts.
>>
>> The most common way to obtain such clusters is via sponsors (companies
>> or individuals) donating resources to the project. For example, the
>> Hive CI is now running mostly on resources donated by Cloudera.
>>
>> There seems to be a process about requesting resources from the Apache
>> Infra team [1] but I am not aware of other ASF projects following this
>> path for performance testing. Most likely the easiest and fastest way
>> to move this forward is through a sponsor. Depending on where the
>> resources come from will also determine the design, implementation,
>> and maintenance.
>>
>> Best,
>> Stamatis
>>
>> [1] https://infra.apache.org/vm-for-project.html
>>
>> On Tue, May 21, 2024 at 11:25 AM Eugene Ryan  wrote:
>> >
>> > Hi,
>> >
>> > I'd like to get folks' opinions on having a public cluster for performance
>> > testing Hive code and getting an early read on whether a commit / build has
>> > caused a performance degradation over existing code.
>> >
>> > There are already well known workloads available, for example, TPC-DS 
>> > (https://github.com/hortonworks/hive-testbench) that can be run so I'm not 
>> > talking about performance test code itself (although that should be as 
>> > easy as possible on top of a dedicated cluster).
>> >
>> > The benefits to the community would be:
>> >- A dedicated environment, not necessarily leaving it to the vendors to 
>> > integrate open-source later into their stacks and only find out some time 
>> > later about performance problems
>> >- Something that can be left set up & running -  no setup and tear-down
>> >process needed every time a performance run is required
>> >- An automated process for performance testing - no manual setup or
>> >intervention
>> >
>> > Concerns:
>> >- Budget
>> >- Who administers the cluster, ie.. who sets it up, fixes it when down
>> >
>> > I'd like to get some opinions on what the process for getting this to
>> > happen would be, bearing in mind that certain things may well be obstacles 
>> > (budget) that have to be solved upfront before anything else happens:
>> >-Budget approval
>> >-   Approval / Sign off - how & who?
>> >-Architecture / pipeline design
>> >-   Implementation
>> >
>> > Thanks, all opinions welcome.
>> > Eugene
>> >
>
>
>
> --
> Eugene


Re: Cluster for Performance Testing Hive

2024-05-22 Thread Eugene Ryan
Thanks for that, Stamatis. Plenty of food for thought there. What would you
think of the best way of getting sponsors on board - when they
read/contribute here, for example?

>From the list of requirements to start a VM, the following could be used as
part of the process, I imagine:
Maintainers:
"Provide the name, Apache ID, and contact info for at least three PMC
members who will maintain the vm " - read “maintain cluster” here or
perhaps this would be the sponsor

On Tue, May 21, 2024 at 1:36 PM Stamatis Zampetakis 
wrote:

> Hey Eugene,
>
> Having a cluster for performance testing is a great idea and it is
> something that has popped up in various contexts.
>
> The most common way to obtain such clusters is via sponsors (companies
> or individuals) donating resources to the project. For example, the
> Hive CI is now running mostly on resources donated by Cloudera.
>
> There seems to be a process about requesting resources from the Apache
> Infra team [1] but I am not aware of other ASF projects following this
> path for performance testing. Most likely the easiest and fastest way
> to move this forward is through a sponsor. Depending on where the
> resources come from will also determine the design, implementation,
> and maintenance.
>
> Best,
> Stamatis
>
> [1] https://infra.apache.org/vm-for-project.html
>
> On Tue, May 21, 2024 at 11:25 AM Eugene Ryan 
> wrote:
> >
> > Hi,
> >
> > I'd like to get folks' opinions on having a public cluster for
> performance
> > testing Hive code and getting an early read on whether a commit / build
> has
> > caused a performance degradation over existing code.
> >
> > There are already well known workloads available, for example, TPC-DS (
> https://github.com/hortonworks/hive-testbench) that can be run so I'm not
> talking about performance test code itself (although that should be as easy
> as possible on top of a dedicated cluster).
> >
> > The benefits to the community would be:
> >- A dedicated environment, not necessarily leaving it to the vendors
> to integrate open-source later into their stacks and only find out some
> time later about performance problems
> >- Something that can be left set up & running -  no setup and
> tear-down
> >process needed every time a performance run is required
> >- An automated process for performance testing - no manual setup or
> >intervention
> >
> > Concerns:
> >- Budget
> >- Who administers the cluster, ie.. who sets it up, fixes it when down
> >
> > I'd like to get some opinions on what the process for getting this to
> > happen would be, bearing in mind that certain things may well be
> obstacles (budget) that have to be solved upfront before anything else
> happens:
> >-Budget approval
> >-   Approval / Sign off - how & who?
> >-Architecture / pipeline design
> >-   Implementation
> >
> > Thanks, all opinions welcome.
> > Eugene
> >
>


-- 
Eugene