Re: Cluster for Performance Testing Hive

2024-05-21 Thread Stamatis Zampetakis
Hey Eugene,

Having a cluster for performance testing is a great idea and it is
something that has popped up in various contexts.

The most common way to obtain such clusters is via sponsors (companies
or individuals) donating resources to the project. For example, the
Hive CI is now running mostly on resources donated by Cloudera.

There seems to be a process about requesting resources from the Apache
Infra team [1] but I am not aware of other ASF projects following this
path for performance testing. Most likely the easiest and fastest way
to move this forward is through a sponsor. Depending on where the
resources come from will also determine the design, implementation,
and maintenance.

Best,
Stamatis

[1] https://infra.apache.org/vm-for-project.html

On Tue, May 21, 2024 at 11:25 AM Eugene Ryan  wrote:
>
> Hi,
>
> I'd like to get folks' opinions on having a public cluster for performance
> testing Hive code and getting an early read on whether a commit / build has
> caused a performance degradation over existing code.
>
> There are already well known workloads available, for example, TPC-DS 
> (https://github.com/hortonworks/hive-testbench) that can be run so I'm not 
> talking about performance test code itself (although that should be as easy 
> as possible on top of a dedicated cluster).
>
> The benefits to the community would be:
>- A dedicated environment, not necessarily leaving it to the vendors to 
> integrate open-source later into their stacks and only find out some time 
> later about performance problems
>- Something that can be left set up & running -  no setup and tear-down
>process needed every time a performance run is required
>- An automated process for performance testing - no manual setup or
>intervention
>
> Concerns:
>- Budget
>- Who administers the cluster, ie.. who sets it up, fixes it when down
>
> I'd like to get some opinions on what the process for getting this to
> happen would be, bearing in mind that certain things may well be obstacles 
> (budget) that have to be solved upfront before anything else happens:
>-Budget approval
>-   Approval / Sign off - how & who?
>-Architecture / pipeline design
>-   Implementation
>
> Thanks, all opinions welcome.
> Eugene
>


Cluster for Performance Testing Hive

2024-05-21 Thread Eugene Ryan
Hi,

I'd like to get folks' opinions on having a public cluster for performance
testing Hive code and getting an early read on whether a commit / build has
caused a performance degradation over existing code.

There are already well known workloads available, for example, TPC-DS (
https://github.com/hortonworks/hive-testbench) that can be run so I'm not
talking about performance test code itself (although that should be as
easy as possible on top of a dedicated cluster).

The benefits to the community would be:
   - A dedicated environment, not necessarily leaving it to the vendors
to integrate open-source later into their stacks and only find out some
time later about performance problems
   - Something that can be left set up & running -  no setup and tear-down
   process needed every time a performance run is required
   - An automated process for performance testing - no manual setup or
   intervention

Concerns:
   - Budget
   - Who administers the cluster, ie.. who sets it up, fixes it when down

I'd like to get some opinions on what the process for getting this to
happen would be, bearing in mind that certain things may well be obstacles
(budget) that have to be solved upfront before anything else happens:
   -Budget approval
   -   Approval / Sign off - how & who?
   -Architecture / pipeline design
   -   Implementation

Thanks, all opinions welcome.
Eugene