Custom Inference Fns in RunInference

2022-09-16 Thread Jack McCluskey via dev
Hey everyone,

I'm back with a brief design doc discussing ways that users could provide
custom inference functions for RunInference model handlers, which is
available at
 
https://docs.google.com/document/d/1YYGsF20kminz7j9ifFdCD5WQwVl8aTeCo0cgPjbdFNU/edit?usp=sharing

 now.

It's not a huge code change or a significantly long doc, but it's
establishing a convention for model handlers moving forward and that
warrants some discussion.

Thanks,

Jack McCluskey

-- 


Jack McCluskey
SWE - DataPLS PLAT/ Beam Go
RDU
jrmcclus...@gmail.com


Re: [ANNOUNCE][Testing] TPC-DS benchmark suite in Beam

2022-09-16 Thread Kenneth Knowles
Awesome. Thank you (all)! We've had so many conversations about it and it
is great to have it running continuously.

Kenn

On Fri, Sep 16, 2022 at 9:29 AM Sachin Agarwal via dev 
wrote:

> This is wonderful - thank you so much to you and the whole Talend team to
> make Beam better!
>
> On Fri, Sep 16, 2022 at 9:11 AM Alexey Romanenko 
> wrote:
>
>> Hi everybody,
>>
>> As some of you may know, at Talend, we’ve been working for a while to add
>> TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part
>> of Beam testing workflow and release routine will help a community to
>> detect quickly the performance regressions or improvements, identify
>> missing or incorrect Beam SQL features and execute Beam SQL on different
>> runtime environments with different runners.
>>
>> What is TPC-DS? From TPC-DS specification document [1]:
>>
>> *“TPC-DS is a decision support benchmark that models several generally
>> applicable aspects of a decision support system, including queries and data
>> maintenance. The benchmark provides a representative evaluation of
>> performance as a general purpose decision support system.” *
>>
>> TPC-DS benchmark suite for Beam is implemented as a separate testing tool
>> for Java SDK (like well known Nexmark benchmark suite) [2]. It supports a
>> limited number of TPC-DS SQL queries for now (mostly because of limited SQL
>> syntax support in Beam), CSV and Parquet as input data format, and it runs
>> on Jenkins with three most popular Beam runners (Spark [3], Flink [4],
>> Dataflow [5]). The job metrics are stored in InfluxDB and can be accessed
>> though Grafana dashboards [6][7][8].
>>
>> More details can be found in Beam documentation [9].
>>
>> For sure, there are still plenty things to do, like adding new runners,
>> support of other SDKs, data formats, etc - so, your contributions are very
>> welcomed in any form. Though, at least for now, we already have a first
>> working and automated version that can be used by community.
>>
>> Also, I’d like to thank everybody who worked on this improvement!
>>
>> —
>> Alexey
>>
>>
>> [1]
>> https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
>> [2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
>> [3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/
>> [4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/
>> [5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/
>> [6]
>> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
>> [7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1
>> [8]
>> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
>> [9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/
>>
>>
>>
>>
>>
>>


Re: Beam Dependency Check Report (2022-09-15)

2022-09-16 Thread Kenneth Knowles
Oooh I misread the subject line. Disregard.

Kenn

On Fri, Sep 16, 2022 at 11:00 AM Danny McCormick via dev <
dev@beam.apache.org> wrote:

> > I'm guessing https://github.com/apache/beam/pull/23229
>
> I don't think so - this dependency report comes from jenkins, that only
> affected things running from GitHub Actions. That change is for the high
> priority issues report, this is supposed to be the dependency report (which
> doesn't seem like its been touched since June when I made some changes
> around the Jira -> GitHub migration -
> https://github.com/apache/beam/tree/master/.test-infra/jenkins/dependency_check
>  -
> it has succeeded a number of times since then though).
>
> It also looks like this has been happening for a while -
> https://lists.apache.org/list?dev@beam.apache.org:2022-7:Beam%20Dependency%20Check%20Report
> (but it still doesn't line up with my Jira->GitHub change  ).
>
> I don't have any immediate ideas on what happened.
>
> On Fri, Sep 16, 2022 at 1:53 PM Kenneth Knowles  wrote:
>
>> I'm guessing https://github.com/apache/beam/pull/23229
>>
>> On Fri, Sep 16, 2022 at 7:15 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Is it a bug that this email is empty?
>>>
>>> > On 15 Sep 2022, at 19:40, Apache Jenkins Server <
>>> jenk...@builds.apache.org> wrote:
>>> >
>>>
>>>


Re: Beam Dependency Check Report (2022-09-15)

2022-09-16 Thread Danny McCormick via dev
> I'm guessing https://github.com/apache/beam/pull/23229

I don't think so - this dependency report comes from jenkins, that only
affected things running from GitHub Actions. That change is for the high
priority issues report, this is supposed to be the dependency report (which
doesn't seem like its been touched since June when I made some changes
around the Jira -> GitHub migration -
https://github.com/apache/beam/tree/master/.test-infra/jenkins/dependency_check
-
it has succeeded a number of times since then though).

It also looks like this has been happening for a while -
https://lists.apache.org/list?dev@beam.apache.org:2022-7:Beam%20Dependency%20Check%20Report
(but it still doesn't line up with my Jira->GitHub change  ).

I don't have any immediate ideas on what happened.

On Fri, Sep 16, 2022 at 1:53 PM Kenneth Knowles  wrote:

> I'm guessing https://github.com/apache/beam/pull/23229
>
> On Fri, Sep 16, 2022 at 7:15 AM Alexey Romanenko 
> wrote:
>
>> Is it a bug that this email is empty?
>>
>> > On 15 Sep 2022, at 19:40, Apache Jenkins Server <
>> jenk...@builds.apache.org> wrote:
>> >
>>
>>


Re: Beam Dependency Check Report (2022-09-15)

2022-09-16 Thread Kenneth Knowles
I'm guessing https://github.com/apache/beam/pull/23229

On Fri, Sep 16, 2022 at 7:15 AM Alexey Romanenko 
wrote:

> Is it a bug that this email is empty?
>
> > On 15 Sep 2022, at 19:40, Apache Jenkins Server <
> jenk...@builds.apache.org> wrote:
> >
>
>


Re: [Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-16 Thread Kenneth Knowles
We've got an "our infrastructure" section on the wiki. I expect it is
probably not super up to date.

On Thu, Sep 15, 2022 at 9:56 AM Brian Hulette via dev 
wrote:

> Is there somewhere we could document this?
>
> On Thu, Sep 15, 2022 at 6:45 AM Moritz Mack  wrote:
>
>> Thank you, Andrew!
>>
>> Exactly what I was looking for, that’s awesome!
>>
>>
>>
>> On 15.09.22, 06:37, "Alexey Romanenko"  wrote:
>>
>>
>>
>>
>>
>> Ahh, great! I didn’t know that 'beam-perf’ label is used for that.
>> Thanks!
>>
>> > On 14 Sep 2022, at 17:47, Andrew Pilloud  wrote:
>> >
>> > We do have a dedicated machine for benchmarks. This is a single
>> > machine limited to running one test at a time. Set the
>> > jenkinsExecutorLabel for the job to 'beam-perf' to use it. For
>> > example:
>> >
>> https://urldefense.com/v3/__https://github.com/apache/beam/blob/66bbee84ed477d86008905646e68b100591b6f78/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Direct.groovy*L36__;Iw!!CiXD_PY!Qat2J4NAyHVo4Cc32PKMn50yw8LgWHmEOm4Ltb7aRV-7KCfNamu3tGOiSYKDUZhLHKu3zlqbBXzJNiX_f_Qteg$
>> 
>>
>> >
>> > Andrew
>> >
>> > On Wed, Sep 14, 2022 at 8:28 AM Alexey Romanenko
>> >  wrote:
>> >>
>> >> I think it depends on the goal why to run that benchmarks. In ideal
>> case, we need to run them on the same dedicated machine(s) and with the
>> same configuration all the time but I’m not sure that it can be achieved in
>> current infrastructure reality.
>> >>
>> >> On the other hand, IIRC, the initial goal of benchmarks, like Nexmark,
>> was to detect fast any major regressions, especially between releases, that
>> are not so sensitive to ideal conditions. And here we a field for
>> improvements.
>> >>
>> >> —
>> >> Alexey
>> >>
>> >> On 13 Sep 2022, at 22:57, Kenneth Knowles  wrote:
>> >>
>> >> Good idea. I'm curious about our current benchmarks. Some of them run
>> on clusters, but I think some of them are running locally and just being
>> noisy. Perhaps this could improve that. (or if they are running on local
>> Spark/Flink then maybe the results are not really meaningful anyhow)
>> >>
>> >> On Tue, Sep 13, 2022 at 2:54 AM Moritz Mack  wrote:
>> >>>
>> >>> Hi team,
>> >>>
>> >>>
>> >>>
>> >>> I’m looking for some help to setup infrastructure to periodically run
>> Java microbenchmarks (JMH).
>> >>>
>> >>> Results of these runs will be added to our community metrics
>> (InfluxDB) to help us track performance, see [1].
>> >>>
>> >>>
>> >>>
>> >>> To prevent noisy runs this would require a dedicated Jenkins machine
>> that runs at most one job (benchmark) at a time. Benchmark runs take quite
>> some time, but on the other hand they don’t have to run very frequently
>> (once a week should be fine initially).
>> >>>
>> >>>
>> >>>
>> >>> Thanks so much,
>> >>>
>> >>> Moritz
>> >>>
>> >>>
>> >>>
>> >>> [1]
>> https://urldefense.com/v3/__https://github.com/apache/beam/pull/23041__;!!CiXD_PY!Qat2J4NAyHVo4Cc32PKMn50yw8LgWHmEOm4Ltb7aRV-7KCfNamu3tGOiSYKDUZhLHKu3zlqbBXzJNiUkaqlEKQ$
>> 
>>
>> >>>
>> >>> As a recipient of an email from Talend, your contact personal data
>> will be on our systems. Please see our privacy notice.
>> >>>
>> >>>
>> >>>
>> >>
>>
>> *As a recipient of an email from Talend, your contact personal data will
>> be on our systems. Please see our privacy notice.
>> *
>>
>>
>>


Re: [ANNOUNCE][Testing] TPC-DS benchmark suite in Beam

2022-09-16 Thread Sachin Agarwal via dev
This is wonderful - thank you so much to you and the whole Talend team to
make Beam better!

On Fri, Sep 16, 2022 at 9:11 AM Alexey Romanenko 
wrote:

> Hi everybody,
>
> As some of you may know, at Talend, we’ve been working for a while to add
> TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part
> of Beam testing workflow and release routine will help a community to
> detect quickly the performance regressions or improvements, identify
> missing or incorrect Beam SQL features and execute Beam SQL on different
> runtime environments with different runners.
>
> What is TPC-DS? From TPC-DS specification document [1]:
>
> *“TPC-DS is a decision support benchmark that models several generally
> applicable aspects of a decision support system, including queries and data
> maintenance. The benchmark provides a representative evaluation of
> performance as a general purpose decision support system.” *
>
> TPC-DS benchmark suite for Beam is implemented as a separate testing tool
> for Java SDK (like well known Nexmark benchmark suite) [2]. It supports a
> limited number of TPC-DS SQL queries for now (mostly because of limited SQL
> syntax support in Beam), CSV and Parquet as input data format, and it runs
> on Jenkins with three most popular Beam runners (Spark [3], Flink [4],
> Dataflow [5]). The job metrics are stored in InfluxDB and can be accessed
> though Grafana dashboards [6][7][8].
>
> More details can be found in Beam documentation [9].
>
> For sure, there are still plenty things to do, like adding new runners,
> support of other SDKs, data formats, etc - so, your contributions are very
> welcomed in any form. Though, at least for now, we already have a first
> working and automated version that can be used by community.
>
> Also, I’d like to thank everybody who worked on this improvement!
>
> —
> Alexey
>
>
> [1]
> https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
> [2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
> [3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/
> [4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/
> [5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/
> [6]
> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
> [7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1
> [8]
> http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
> [9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/
>
>
>
>
>
>


[ANNOUNCE][Testing] TPC-DS benchmark suite in Beam

2022-09-16 Thread Alexey Romanenko
Hi everybody,

As some of you may know, at Talend, we’ve been working for a while to add 
TPC-DS benchmark suite into Beam. We believe that having TPC-DS as a part of 
Beam testing workflow and release routine will help a community to detect 
quickly the performance regressions or improvements, identify missing or 
incorrect Beam SQL features and execute Beam SQL on different runtime 
environments with different runners. 

What is TPC-DS? From TPC-DS specification document [1]:

“TPC-DS is a decision support benchmark that models several generally 
applicable aspects of a decision support system, including queries and data 
maintenance. The benchmark provides a representative evaluation of performance 
as a general purpose decision support system.” 

TPC-DS benchmark suite for Beam is implemented as a separate testing tool for 
Java SDK (like well known Nexmark benchmark suite) [2]. It supports a limited 
number of TPC-DS SQL queries for now (mostly because of limited SQL syntax 
support in Beam), CSV and Parquet as input data format, and it runs on Jenkins 
with three most popular Beam runners (Spark [3], Flink [4], Dataflow [5]). The 
job metrics are stored in InfluxDB and can be accessed though Grafana 
dashboards [6][7][8]. 

More details can be found in Beam documentation [9].

For sure, there are still plenty things to do, like adding new runners, support 
of other SDKs, data formats, etc - so, your contributions are very welcomed in 
any form. Though, at least for now, we already have a first working and 
automated version that can be used by community. 

Also, I’d like to thank everybody who worked on this improvement!

—
Alexey


[1] 
https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp 

[2] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds 

[3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Spark/ 

[4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Flink/ 

[5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Tpcds_Dataflow/ 

[6] 
http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
 

[7] http://metrics.beam.apache.org/d/8INnSY9Mv/tpc-ds-flink-sql?orgId=1 

[8] 
http://metrics.beam.apache.org/d/tkqc0AdGk2/tpc-ds-spark-classic-new-sql?orgId=1
 

[9] https://beam.apache.org/documentation/sdks/java/testing/tpcds/ 








Re: Beam Dependency Check Report (2022-09-15)

2022-09-16 Thread Alexey Romanenko
Is it a bug that this email is empty? 

> On 15 Sep 2022, at 19:40, Apache Jenkins Server  
> wrote:
>