Performance Testing Next Steps

2017-02-17 Thread Jason Kuster
Hi all,

I've written up a doc on next steps for getting performance testing up and
running for Beam. I'd love to hear from people -- there's a fair amount of
work encapsulated in here, but the end result is that we have a performance
testing system which we can use for benchmarking all aspects of Beam, which
would be really exciting. Looking forward to your thoughts.

https://docs.google.com/document/d/1PsjGPSN6FuorEEPrKEP3u3m16tyOzph5FnL2DhaRDz0/edit?ts=58a78e73

Best,

Jason

-- 
---
Jason Kuster
Apache Beam / Google Cloud Dataflow


Re: Performance Testing Next Steps

2017-02-22 Thread Jason Kuster
Hey all, just wanted to pop this up again for people -- if anyone has
thoughts on performance testing please feel welcome to chime in. :)

On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster 
wrote:

> Hi all,
>
> I've written up a doc on next steps for getting performance testing up and
> running for Beam. I'd love to hear from people -- there's a fair amount of
> work encapsulated in here, but the end result is that we have a performance
> testing system which we can use for benchmarking all aspects of Beam, which
> would be really exciting. Looking forward to your thoughts.
>
> https://docs.google.com/document/d/1PsjGPSN6FuorEEPrKEP3u3m16tyOz
> ph5FnL2DhaRDz0/edit?ts=58a78e73
>
> Best,
>
> Jason
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>



-- 
---
Jason Kuster
Apache Beam / Google Cloud Dataflow


Re: Performance Testing Next Steps

2017-02-28 Thread Kenneth Knowles
Just got a chance to look this over. I don't have anything to add, but I'm
pretty excited to follow this project. Have the JIRAs been filed since you
shared the doc?

On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
jasonkus...@google.com.invalid> wrote:

> Hey all, just wanted to pop this up again for people -- if anyone has
> thoughts on performance testing please feel welcome to chime in. :)
>
> On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster 
> wrote:
>
> > Hi all,
> >
> > I've written up a doc on next steps for getting performance testing up
> and
> > running for Beam. I'd love to hear from people -- there's a fair amount
> of
> > work encapsulated in here, but the end result is that we have a
> performance
> > testing system which we can use for benchmarking all aspects of Beam,
> which
> > would be really exciting. Looking forward to your thoughts.
> >
> > https://docs.google.com/document/d/1PsjGPSN6FuorEEPrKEP3u3m16tyOz
> > ph5FnL2DhaRDz0/edit?ts=58a78e73
> >
> > Best,
> >
> > Jason
> >
> > --
> > ---
> > Jason Kuster
> > Apache Beam / Google Cloud Dataflow
> >
>
>
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>


Re: Performance Testing Next Steps

2017-03-01 Thread Aljoscha Krettek
Thanks for writing this and taking care of this, Jason!

I'm afraid I also cannot add anything except that I'm excited to see some
results from this.

On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles  wrote:

Just got a chance to look this over. I don't have anything to add, but I'm
pretty excited to follow this project. Have the JIRAs been filed since you
shared the doc?

On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
jasonkus...@google.com.invalid> wrote:

> Hey all, just wanted to pop this up again for people -- if anyone has
> thoughts on performance testing please feel welcome to chime in. :)
>
> On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster 
> wrote:
>
> > Hi all,
> >
> > I've written up a doc on next steps for getting performance testing up
> and
> > running for Beam. I'd love to hear from people -- there's a fair amount
> of
> > work encapsulated in here, but the end result is that we have a
> performance
> > testing system which we can use for benchmarking all aspects of Beam,
> which
> > would be really exciting. Looking forward to your thoughts.
> >
> > https://docs.google.com/document/d/1PsjGPSN6FuorEEPrKEP3u3m16tyOz
> > ph5FnL2DhaRDz0/edit?ts=58a78e73
> >
> > Best,
> >
> > Jason
> >
> > --
> > ---
> > Jason Kuster
> > Apache Beam / Google Cloud Dataflow
> >
>
>
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>


Re: Performance Testing Next Steps

2017-03-02 Thread Jason Kuster
Glad to hear the excitement. :)

Filed BEAM-1595 - 1609 to track work items. Some of these fall under runner
components, please feel free to reach out to me if you have any questions
about how to accomplish these.

Best,

Jason

On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek 
wrote:

> Thanks for writing this and taking care of this, Jason!
>
> I'm afraid I also cannot add anything except that I'm excited to see some
> results from this.
>
> On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles 
> wrote:
>
> Just got a chance to look this over. I don't have anything to add, but I'm
> pretty excited to follow this project. Have the JIRAs been filed since you
> shared the doc?
>
> On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
> jasonkus...@google.com.invalid> wrote:
>
> > Hey all, just wanted to pop this up again for people -- if anyone has
> > thoughts on performance testing please feel welcome to chime in. :)
> >
> > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster 
> > wrote:
> >
> > > Hi all,
> > >
> > > I've written up a doc on next steps for getting performance testing up
> > and
> > > running for Beam. I'd love to hear from people -- there's a fair amount
> > of
> > > work encapsulated in here, but the end result is that we have a
> > performance
> > > testing system which we can use for benchmarking all aspects of Beam,
> > which
> > > would be really exciting. Looking forward to your thoughts.
> > >
> > > https://docs.google.com/document/d/1PsjGPSN6FuorEEPrKEP3u3m16tyOz
> > > ph5FnL2DhaRDz0/edit?ts=58a78e73
> > >
> > > Best,
> > >
> > > Jason
> > >
> > > --
> > > ---
> > > Jason Kuster
> > > Apache Beam / Google Cloud Dataflow
> > >
> >
> >
> >
> > --
> > ---
> > Jason Kuster
> > Apache Beam / Google Cloud Dataflow
> >
>



-- 
---
Jason Kuster
Apache Beam / Google Cloud Dataflow


Re: Performance Testing Next Steps

2017-03-02 Thread Ahmet Altay
Thank you Jason, this is great.

Which one of these issues fall into the land of sdk-py?

Ahmet

On Thu, Mar 2, 2017 at 12:34 PM, Jason Kuster <
jasonkus...@google.com.invalid> wrote:

> Glad to hear the excitement. :)
>
> Filed BEAM-1595 - 1609 to track work items. Some of these fall under runner
> components, please feel free to reach out to me if you have any questions
> about how to accomplish these.
>
> Best,
>
> Jason
>
> On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek 
> wrote:
>
> > Thanks for writing this and taking care of this, Jason!
> >
> > I'm afraid I also cannot add anything except that I'm excited to see some
> > results from this.
> >
> > On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles 
> > wrote:
> >
> > Just got a chance to look this over. I don't have anything to add, but
> I'm
> > pretty excited to follow this project. Have the JIRAs been filed since
> you
> > shared the doc?
> >
> > On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
> > jasonkus...@google.com.invalid> wrote:
> >
> > > Hey all, just wanted to pop this up again for people -- if anyone has
> > > thoughts on performance testing please feel welcome to chime in. :)
> > >
> > > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I've written up a doc on next steps for getting performance testing
> up
> > > and
> > > > running for Beam. I'd love to hear from people -- there's a fair
> amount
> > > of
> > > > work encapsulated in here, but the end result is that we have a
> > > performance
> > > > testing system which we can use for benchmarking all aspects of Beam,
> > > which
> > > > would be really exciting. Looking forward to your thoughts.
> > > >
> > > > https://docs.google.com/document/d/1PsjGPSN6FuorEEPrKEP3u3m16tyOz
> > > > ph5FnL2DhaRDz0/edit?ts=58a78e73
> > > >
> > > > Best,
> > > >
> > > > Jason
> > > >
> > > > --
> > > > ---
> > > > Jason Kuster
> > > > Apache Beam / Google Cloud Dataflow
> > > >
> > >
> > >
> > >
> > > --
> > > ---
> > > Jason Kuster
> > > Apache Beam / Google Cloud Dataflow
> > >
> >
>
>
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>


Re: Performance Testing Next Steps

2017-03-02 Thread Jason Kuster
D'oh, my bad Ahmet. I've opened BEAM-1610, which handles support for Python
in PKB against the Dataflow runner. Once the Fn API progresses some more we
can add some work items for the other runners too. Let's chat about this
more, maybe next week?

On Thu, Mar 2, 2017 at 1:31 PM, Ahmet Altay 
wrote:

> Thank you Jason, this is great.
>
> Which one of these issues fall into the land of sdk-py?
>
> Ahmet
>
> On Thu, Mar 2, 2017 at 12:34 PM, Jason Kuster <
> jasonkus...@google.com.invalid> wrote:
>
> > Glad to hear the excitement. :)
> >
> > Filed BEAM-1595 - 1609 to track work items. Some of these fall under
> runner
> > components, please feel free to reach out to me if you have any questions
> > about how to accomplish these.
> >
> > Best,
> >
> > Jason
> >
> > On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek 
> > wrote:
> >
> > > Thanks for writing this and taking care of this, Jason!
> > >
> > > I'm afraid I also cannot add anything except that I'm excited to see
> some
> > > results from this.
> > >
> > > On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles 
> > > wrote:
> > >
> > > Just got a chance to look this over. I don't have anything to add, but
> > I'm
> > > pretty excited to follow this project. Have the JIRAs been filed since
> > you
> > > shared the doc?
> > >
> > > On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
> > > jasonkus...@google.com.invalid> wrote:
> > >
> > > > Hey all, just wanted to pop this up again for people -- if anyone has
> > > > thoughts on performance testing please feel welcome to chime in. :)
> > > >
> > > > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster <
> jasonkus...@google.com>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I've written up a doc on next steps for getting performance testing
> > up
> > > > and
> > > > > running for Beam. I'd love to hear from people -- there's a fair
> > amount
> > > > of
> > > > > work encapsulated in here, but the end result is that we have a
> > > > performance
> > > > > testing system which we can use for benchmarking all aspects of
> Beam,
> > > > which
> > > > > would be really exciting. Looking forward to your thoughts.
> > > > >
> > > > > https://docs.google.com/document/d/1PsjGPSN6FuorEEPrKEP3u3m16tyOz
> > > > > ph5FnL2DhaRDz0/edit?ts=58a78e73
> > > > >
> > > > > Best,
> > > > >
> > > > > Jason
> > > > >
> > > > > --
> > > > > ---
> > > > > Jason Kuster
> > > > > Apache Beam / Google Cloud Dataflow
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > ---
> > > > Jason Kuster
> > > > Apache Beam / Google Cloud Dataflow
> > > >
> > >
> >
> >
> >
> > --
> > ---
> > Jason Kuster
> > Apache Beam / Google Cloud Dataflow
> >
>



-- 
---
Jason Kuster
Apache Beam / Google Cloud Dataflow


Re: Performance Testing Next Steps

2017-03-02 Thread Ahmet Altay
Sounds great, thank you!

On Thu, Mar 2, 2017 at 1:41 PM, Jason Kuster  wrote:

> D'oh, my bad Ahmet. I've opened BEAM-1610, which handles support for Python
> in PKB against the Dataflow runner. Once the Fn API progresses some more we
> can add some work items for the other runners too. Let's chat about this
> more, maybe next week?
>
> On Thu, Mar 2, 2017 at 1:31 PM, Ahmet Altay 
> wrote:
>
> > Thank you Jason, this is great.
> >
> > Which one of these issues fall into the land of sdk-py?
> >
> > Ahmet
> >
> > On Thu, Mar 2, 2017 at 12:34 PM, Jason Kuster <
> > jasonkus...@google.com.invalid> wrote:
> >
> > > Glad to hear the excitement. :)
> > >
> > > Filed BEAM-1595 - 1609 to track work items. Some of these fall under
> > runner
> > > components, please feel free to reach out to me if you have any
> questions
> > > about how to accomplish these.
> > >
> > > Best,
> > >
> > > Jason
> > >
> > > On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek 
> > > wrote:
> > >
> > > > Thanks for writing this and taking care of this, Jason!
> > > >
> > > > I'm afraid I also cannot add anything except that I'm excited to see
> > some
> > > > results from this.
> > > >
> > > > On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles 
> > > > wrote:
> > > >
> > > > Just got a chance to look this over. I don't have anything to add,
> but
> > > I'm
> > > > pretty excited to follow this project. Have the JIRAs been filed
> since
> > > you
> > > > shared the doc?
> > > >
> > > > On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
> > > > jasonkus...@google.com.invalid> wrote:
> > > >
> > > > > Hey all, just wanted to pop this up again for people -- if anyone
> has
> > > > > thoughts on performance testing please feel welcome to chime in. :)
> > > > >
> > > > > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster <
> > jasonkus...@google.com>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I've written up a doc on next steps for getting performance
> testing
> > > up
> > > > > and
> > > > > > running for Beam. I'd love to hear from people -- there's a fair
> > > amount
> > > > > of
> > > > > > work encapsulated in here, but the end result is that we have a
> > > > > performance
> > > > > > testing system which we can use for benchmarking all aspects of
> > Beam,
> > > > > which
> > > > > > would be really exciting. Looking forward to your thoughts.
> > > > > >
> > > > > > https://docs.google.com/document/d/
> 1PsjGPSN6FuorEEPrKEP3u3m16tyOz
> > > > > > ph5FnL2DhaRDz0/edit?ts=58a78e73
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Jason
> > > > > >
> > > > > > --
> > > > > > ---
> > > > > > Jason Kuster
> > > > > > Apache Beam / Google Cloud Dataflow
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > ---
> > > > > Jason Kuster
> > > > > Apache Beam / Google Cloud Dataflow
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > ---
> > > Jason Kuster
> > > Apache Beam / Google Cloud Dataflow
> > >
> >
>
>
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>


Re: Performance Testing Next Steps

2017-03-02 Thread Amit Sela
Looks great, and I'll be sure to follow this. Ping me if I can assist in
any way!

On Fri, Mar 3, 2017 at 12:09 AM Ahmet Altay 
wrote:

> Sounds great, thank you!
>
> On Thu, Mar 2, 2017 at 1:41 PM, Jason Kuster  .invalid
> > wrote:
>
> > D'oh, my bad Ahmet. I've opened BEAM-1610, which handles support for
> Python
> > in PKB against the Dataflow runner. Once the Fn API progresses some more
> we
> > can add some work items for the other runners too. Let's chat about this
> > more, maybe next week?
> >
> > On Thu, Mar 2, 2017 at 1:31 PM, Ahmet Altay 
> > wrote:
> >
> > > Thank you Jason, this is great.
> > >
> > > Which one of these issues fall into the land of sdk-py?
> > >
> > > Ahmet
> > >
> > > On Thu, Mar 2, 2017 at 12:34 PM, Jason Kuster <
> > > jasonkus...@google.com.invalid> wrote:
> > >
> > > > Glad to hear the excitement. :)
> > > >
> > > > Filed BEAM-1595 - 1609 to track work items. Some of these fall under
> > > runner
> > > > components, please feel free to reach out to me if you have any
> > questions
> > > > about how to accomplish these.
> > > >
> > > > Best,
> > > >
> > > > Jason
> > > >
> > > > On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek <
> aljos...@apache.org>
> > > > wrote:
> > > >
> > > > > Thanks for writing this and taking care of this, Jason!
> > > > >
> > > > > I'm afraid I also cannot add anything except that I'm excited to
> see
> > > some
> > > > > results from this.
> > > > >
> > > > > On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles  >
> > > > > wrote:
> > > > >
> > > > > Just got a chance to look this over. I don't have anything to add,
> > but
> > > > I'm
> > > > > pretty excited to follow this project. Have the JIRAs been filed
> > since
> > > > you
> > > > > shared the doc?
> > > > >
> > > > > On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
> > > > > jasonkus...@google.com.invalid> wrote:
> > > > >
> > > > > > Hey all, just wanted to pop this up again for people -- if anyone
> > has
> > > > > > thoughts on performance testing please feel welcome to chime in.
> :)
> > > > > >
> > > > > > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster <
> > > jasonkus...@google.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I've written up a doc on next steps for getting performance
> > testing
> > > > up
> > > > > > and
> > > > > > > running for Beam. I'd love to hear from people -- there's a
> fair
> > > > amount
> > > > > > of
> > > > > > > work encapsulated in here, but the end result is that we have a
> > > > > > performance
> > > > > > > testing system which we can use for benchmarking all aspects of
> > > Beam,
> > > > > > which
> > > > > > > would be really exciting. Looking forward to your thoughts.
> > > > > > >
> > > > > > > https://docs.google.com/document/d/
> > 1PsjGPSN6FuorEEPrKEP3u3m16tyOz
> > > > > > > ph5FnL2DhaRDz0/edit?ts=58a78e73
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Jason
> > > > > > >
> > > > > > > --
> > > > > > > ---
> > > > > > > Jason Kuster
> > > > > > > Apache Beam / Google Cloud Dataflow
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > ---
> > > > > > Jason Kuster
> > > > > > Apache Beam / Google Cloud Dataflow
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > ---
> > > > Jason Kuster
> > > > Apache Beam / Google Cloud Dataflow
> > > >
> > >
> >
> >
> >
> > --
> > ---
> > Jason Kuster
> > Apache Beam / Google Cloud Dataflow
> >
>


Re: Performance Testing Next Steps

2017-03-15 Thread Ismaël Mejía
Excellent proposal, sorry to jump into this discussion so late, this
was in my toread list for almost two weeks, and I finally got the time
to read the document and I have two minor comments:

I have the impression that the strict separation of Providers (the
data-processing systems) and Resources (the concrete Data Stores)
makes sense for the general case, but is lacking if what we want to
test are things in the Hadoop ecosystem where the data stores commonly
co-exist in the same group of machines with the data-processing
systems (the Providers), e.g. HDFS, Hbase + YARN. This is important to
correctly test that data locality works correctly for example. Have
you considered such case?

Another thing I noticed is that in the list of runners supporting PKB
the Direct Runner is not included, is there any particular reason for
this? I think that even if performance is not the main goal of the
direct runner it can be nice to have it there too to catch any
performance regressions, or is it because it is already ready for it?
what do you think?

Thanks,
Ismaël

On Thu, Mar 2, 2017 at 11:49 PM, Amit Sela  wrote:
> Looks great, and I'll be sure to follow this. Ping me if I can assist in
> any way!
>
> On Fri, Mar 3, 2017 at 12:09 AM Ahmet Altay 
> wrote:
>
>> Sounds great, thank you!
>>
>> On Thu, Mar 2, 2017 at 1:41 PM, Jason Kuster > .invalid
>> > wrote:
>>
>> > D'oh, my bad Ahmet. I've opened BEAM-1610, which handles support for
>> Python
>> > in PKB against the Dataflow runner. Once the Fn API progresses some more
>> we
>> > can add some work items for the other runners too. Let's chat about this
>> > more, maybe next week?
>> >
>> > On Thu, Mar 2, 2017 at 1:31 PM, Ahmet Altay 
>> > wrote:
>> >
>> > > Thank you Jason, this is great.
>> > >
>> > > Which one of these issues fall into the land of sdk-py?
>> > >
>> > > Ahmet
>> > >
>> > > On Thu, Mar 2, 2017 at 12:34 PM, Jason Kuster <
>> > > jasonkus...@google.com.invalid> wrote:
>> > >
>> > > > Glad to hear the excitement. :)
>> > > >
>> > > > Filed BEAM-1595 - 1609 to track work items. Some of these fall under
>> > > runner
>> > > > components, please feel free to reach out to me if you have any
>> > questions
>> > > > about how to accomplish these.
>> > > >
>> > > > Best,
>> > > >
>> > > > Jason
>> > > >
>> > > > On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek <
>> aljos...@apache.org>
>> > > > wrote:
>> > > >
>> > > > > Thanks for writing this and taking care of this, Jason!
>> > > > >
>> > > > > I'm afraid I also cannot add anything except that I'm excited to
>> see
>> > > some
>> > > > > results from this.
>> > > > >
>> > > > > On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles > >
>> > > > > wrote:
>> > > > >
>> > > > > Just got a chance to look this over. I don't have anything to add,
>> > but
>> > > > I'm
>> > > > > pretty excited to follow this project. Have the JIRAs been filed
>> > since
>> > > > you
>> > > > > shared the doc?
>> > > > >
>> > > > > On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
>> > > > > jasonkus...@google.com.invalid> wrote:
>> > > > >
>> > > > > > Hey all, just wanted to pop this up again for people -- if anyone
>> > has
>> > > > > > thoughts on performance testing please feel welcome to chime in.
>> :)
>> > > > > >
>> > > > > > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster <
>> > > jasonkus...@google.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi all,
>> > > > > > >
>> > > > > > > I've written up a doc on next steps for getting performance
>> > testing
>> > > > up
>> > > > > > and
>> > > > > > > running for Beam. I'd love to hear from people -- there's a
>> fair
>> > > > amount
>> > > > > > of
>> > > > > > > work encapsulated in here, but the end result is that we have a
>> > > > > > performance
>> > > > > > > testing system which we can use for benchmarking all aspects of
>> > > Beam,
>> > > > > > which
>> > > > > > > would be really exciting. Looking forward to your thoughts.
>> > > > > > >
>> > > > > > > https://docs.google.com/document/d/
>> > 1PsjGPSN6FuorEEPrKEP3u3m16tyOz
>> > > > > > > ph5FnL2DhaRDz0/edit?ts=58a78e73
>> > > > > > >
>> > > > > > > Best,
>> > > > > > >
>> > > > > > > Jason
>> > > > > > >
>> > > > > > > --
>> > > > > > > ---
>> > > > > > > Jason Kuster
>> > > > > > > Apache Beam / Google Cloud Dataflow
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > ---
>> > > > > > Jason Kuster
>> > > > > > Apache Beam / Google Cloud Dataflow
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > ---
>> > > > Jason Kuster
>> > > > Apache Beam / Google Cloud Dataflow
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > ---
>> > Jason Kuster
>> > Apache Beam / Google Cloud Dataflow
>> >
>>


Re: Performance Testing Next Steps

2017-03-16 Thread Jason Kuster
Thanks Ismael for the comments! Replied inline.

On Wed, Mar 15, 2017 at 8:18 AM, Ismaël Mejía  wrote:

> Excellent proposal, sorry to jump into this discussion so late, this
> was in my toread list for almost two weeks, and I finally got the time
> to read the document and I have two minor comments:
>
> I have the impression that the strict separation of Providers (the
> data-processing systems) and Resources (the concrete Data Stores)
> makes sense for the general case, but is lacking if what we want to
> test are things in the Hadoop ecosystem where the data stores commonly
> co-exist in the same group of machines with the data-processing
> systems (the Providers), e.g. HDFS, Hbase + YARN. This is important to
> correctly test that data locality works correctly for example. Have
> you considered such case?
>

Definitely interesting to think about, and I don't think I added provisions
for this in the doc. My impression, though, is that since the providers and
the data stores are not coupled, if the provider we are bringing up also
provides the data store, we can just omit the data store for that benchmark
and use what we've already brought up. Does that answer your question, or
have I misunderstood?

>
> Another thing I noticed is that in the list of runners supporting PKB
> the Direct Runner is not included, is there any particular reason for
> this? I think that even if performance is not the main goal of the
> direct runner it can be nice to have it there too to catch any
> performance regressions, or is it because it is already ready for it?
> what do you think?
>
>
Great point -- I neglected to include the DirectRunner in the plans here.
I'll add it to the doc and file a JIRA.


> Thanks,
> Ismaël
>
> On Thu, Mar 2, 2017 at 11:49 PM, Amit Sela  wrote:
> > Looks great, and I'll be sure to follow this. Ping me if I can assist in
> > any way!
> >
> > On Fri, Mar 3, 2017 at 12:09 AM Ahmet Altay 
> > wrote:
> >
> >> Sounds great, thank you!
> >>
> >> On Thu, Mar 2, 2017 at 1:41 PM, Jason Kuster  >> .invalid
> >> > wrote:
> >>
> >> > D'oh, my bad Ahmet. I've opened BEAM-1610, which handles support for
> >> Python
> >> > in PKB against the Dataflow runner. Once the Fn API progresses some
> more
> >> we
> >> > can add some work items for the other runners too. Let's chat about
> this
> >> > more, maybe next week?
> >> >
> >> > On Thu, Mar 2, 2017 at 1:31 PM, Ahmet Altay  >
> >> > wrote:
> >> >
> >> > > Thank you Jason, this is great.
> >> > >
> >> > > Which one of these issues fall into the land of sdk-py?
> >> > >
> >> > > Ahmet
> >> > >
> >> > > On Thu, Mar 2, 2017 at 12:34 PM, Jason Kuster <
> >> > > jasonkus...@google.com.invalid> wrote:
> >> > >
> >> > > > Glad to hear the excitement. :)
> >> > > >
> >> > > > Filed BEAM-1595 - 1609 to track work items. Some of these fall
> under
> >> > > runner
> >> > > > components, please feel free to reach out to me if you have any
> >> > questions
> >> > > > about how to accomplish these.
> >> > > >
> >> > > > Best,
> >> > > >
> >> > > > Jason
> >> > > >
> >> > > > On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek <
> >> aljos...@apache.org>
> >> > > > wrote:
> >> > > >
> >> > > > > Thanks for writing this and taking care of this, Jason!
> >> > > > >
> >> > > > > I'm afraid I also cannot add anything except that I'm excited to
> >> see
> >> > > some
> >> > > > > results from this.
> >> > > > >
> >> > > > > On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles
>  >> >
> >> > > > > wrote:
> >> > > > >
> >> > > > > Just got a chance to look this over. I don't have anything to
> add,
> >> > but
> >> > > > I'm
> >> > > > > pretty excited to follow this project. Have the JIRAs been filed
> >> > since
> >> > > > you
> >> > > > > shared the doc?
> >> > > > >
> >> > > > > On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
> >> > > > > jasonkus...@google.com.invalid> wrote:
> >> > > > >
> >> > > > > > Hey all, just wanted to pop this up again for people -- if
> anyone
> >> > has
> >> > > > > > thoughts on performance testing please feel welcome to chime
> in.
> >> :)
> >> > > > > >
> >> > > > > > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster <
> >> > > jasonkus...@google.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi all,
> >> > > > > > >
> >> > > > > > > I've written up a doc on next steps for getting performance
> >> > testing
> >> > > > up
> >> > > > > > and
> >> > > > > > > running for Beam. I'd love to hear from people -- there's a
> >> fair
> >> > > > amount
> >> > > > > > of
> >> > > > > > > work encapsulated in here, but the end result is that we
> have a
> >> > > > > > performance
> >> > > > > > > testing system which we can use for benchmarking all
> aspects of
> >> > > Beam,
> >> > > > > > which
> >> > > > > > > would be really exciting. Looking forward to your thoughts.
> >> > > > > > >
> >> > > > > > > https://docs.google.com/document/d/
> >> > 1PsjGPSN6FuorEEPrKEP3u3m16tyOz
> >> > > > > > > ph5FnL2DhaRDz0/edit?ts=58a78e73
> >> > > > > > >
> >> > > > > > > Bes

Re: Performance Testing Next Steps

2017-03-16 Thread Ismaël Mejía
> .. if the provider we are bringing up also
> provides the data store, we can just omit the data store for that benchmark
> and use what we've already brought up. Does that answer your question, or
> have I misunderstood?

Yes, and it is a perfect approach for the case, great idea.

> Great point -- I neglected to include the DirectRunner in the plans here.
> I'll add it to the doc and file a JIRA.

Excellent.

This work is super interesting so don’t hesitate to ask anything from
us the rest of the community because I think there are many of us
interested and we can give a hand if needed.


On Thu, Mar 16, 2017 at 9:17 AM, Jason Kuster
 wrote:
> Thanks Ismael for the comments! Replied inline.
>
> On Wed, Mar 15, 2017 at 8:18 AM, Ismaël Mejía  wrote:
>
>> Excellent proposal, sorry to jump into this discussion so late, this
>> was in my toread list for almost two weeks, and I finally got the time
>> to read the document and I have two minor comments:
>>
>> I have the impression that the strict separation of Providers (the
>> data-processing systems) and Resources (the concrete Data Stores)
>> makes sense for the general case, but is lacking if what we want to
>> test are things in the Hadoop ecosystem where the data stores commonly
>> co-exist in the same group of machines with the data-processing
>> systems (the Providers), e.g. HDFS, Hbase + YARN. This is important to
>> correctly test that data locality works correctly for example. Have
>> you considered such case?
>>
>
> Definitely interesting to think about, and I don't think I added provisions
> for this in the doc. My impression, though, is that since the providers and
> the data stores are not coupled, if the provider we are bringing up also
> provides the data store, we can just omit the data store for that benchmark
> and use what we've already brought up. Does that answer your question, or
> have I misunderstood?
>
>>
>> Another thing I noticed is that in the list of runners supporting PKB
>> the Direct Runner is not included, is there any particular reason for
>> this? I think that even if performance is not the main goal of the
>> direct runner it can be nice to have it there too to catch any
>> performance regressions, or is it because it is already ready for it?
>> what do you think?
>>
>>
> Great point -- I neglected to include the DirectRunner in the plans here.
> I'll add it to the doc and file a JIRA.
>
>
>> Thanks,
>> Ismaël
>>
>> On Thu, Mar 2, 2017 at 11:49 PM, Amit Sela  wrote:
>> > Looks great, and I'll be sure to follow this. Ping me if I can assist in
>> > any way!
>> >
>> > On Fri, Mar 3, 2017 at 12:09 AM Ahmet Altay 
>> > wrote:
>> >
>> >> Sounds great, thank you!
>> >>
>> >> On Thu, Mar 2, 2017 at 1:41 PM, Jason Kuster > >> .invalid
>> >> > wrote:
>> >>
>> >> > D'oh, my bad Ahmet. I've opened BEAM-1610, which handles support for
>> >> Python
>> >> > in PKB against the Dataflow runner. Once the Fn API progresses some
>> more
>> >> we
>> >> > can add some work items for the other runners too. Let's chat about
>> this
>> >> > more, maybe next week?
>> >> >
>> >> > On Thu, Mar 2, 2017 at 1:31 PM, Ahmet Altay > >
>> >> > wrote:
>> >> >
>> >> > > Thank you Jason, this is great.
>> >> > >
>> >> > > Which one of these issues fall into the land of sdk-py?
>> >> > >
>> >> > > Ahmet
>> >> > >
>> >> > > On Thu, Mar 2, 2017 at 12:34 PM, Jason Kuster <
>> >> > > jasonkus...@google.com.invalid> wrote:
>> >> > >
>> >> > > > Glad to hear the excitement. :)
>> >> > > >
>> >> > > > Filed BEAM-1595 - 1609 to track work items. Some of these fall
>> under
>> >> > > runner
>> >> > > > components, please feel free to reach out to me if you have any
>> >> > questions
>> >> > > > about how to accomplish these.
>> >> > > >
>> >> > > > Best,
>> >> > > >
>> >> > > > Jason
>> >> > > >
>> >> > > > On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek <
>> >> aljos...@apache.org>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > Thanks for writing this and taking care of this, Jason!
>> >> > > > >
>> >> > > > > I'm afraid I also cannot add anything except that I'm excited to
>> >> see
>> >> > > some
>> >> > > > > results from this.
>> >> > > > >
>> >> > > > > On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles
>> > >> >
>> >> > > > > wrote:
>> >> > > > >
>> >> > > > > Just got a chance to look this over. I don't have anything to
>> add,
>> >> > but
>> >> > > > I'm
>> >> > > > > pretty excited to follow this project. Have the JIRAs been filed
>> >> > since
>> >> > > > you
>> >> > > > > shared the doc?
>> >> > > > >
>> >> > > > > On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
>> >> > > > > jasonkus...@google.com.invalid> wrote:
>> >> > > > >
>> >> > > > > > Hey all, just wanted to pop this up again for people -- if
>> anyone
>> >> > has
>> >> > > > > > thoughts on performance testing please feel welcome to chime
>> in.
>> >> :)
>> >> > > > > >
>> >> > > > > > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster <
>> >> > > jasonkus...@google.com>
>> >> > > > > > wrote:
>> >> > > > > >