Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-11 Thread Kamil Szewczyk
Hi all,

as a positive outcome of extending kubernetes cluster at the bottom of the
https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_Analysis/37/consoleText
and on dedicated slack channel
https://apachebeam.slack.com/messages/CAB3W69SS/ we can observe better
stability of the tests after cluster resize. Most of the execution times
slightly decreased and finally, all tests were executed and analysed.

Thanks,
Kamil Szewczyk



2018-06-08 13:13 GMT+02:00 Łukasz Gajowy :

> @Pablo this is exactly as Chamikara says. In fact, there is a dedicated
> Gcloud project for whole testing infrastructure (called
> "apache-beam-testing"). It provides the Kubernetes cluster for the data
> stores as well as big query storage for the test results presented in the
> testing dashboard.
>
> @Alan thanks a lot!
>
> Best regards,
> Łukasz
>
>
>
> czw., 7 cze 2018 o 22:37 Chamikara Jayalath 
> napisał(a):
>
>> We still use Jenkins machines to execute the test but data stores are
>> hosted in Kubernetes.
>>
>> On Thu, Jun 7, 2018 at 1:35 PM Pablo Estrada  wrote:
>>
>>> Just out of curiosity: This does not use the Jenkins machines then?
>>> -P.
>>>
>>> On Thu, Jun 7, 2018 at 1:33 PM Alan Myrvold  wrote:
>>>
 Done. Changed the size of the io-datastores kubernetes cluster in
 apache-beam-testing to 3 nodes.

 On Thu, Jun 7, 2018 at 1:45 AM Kamil Szewczyk 
 wrote:

> Hi,
>
> the node pool size of io-datastores kubernetes cluster in
> apache-beam-testing project must be changed from 1 -> 3 (or other value).
> @Alan Myrvold was already helpful with kubernetes cluster settings so
> far, but I am not aware who made decisions regarding that as
> this will increase monthly billing.
>
> Kamil Szewczyk
>
> 2018-06-07 6:27 GMT+02:00 Kenneth Knowles :
>
>> This is rad. Another +1 from me for a bigger cluster. What do you
>> need to make that happen?
>>
>> Kenn
>>
>> On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada 
>> wrote:
>>
>>> This is really cool!
>>>
>>> +1 for having a cluster with more than one machine run the test.
>>>
>>> -P.
>>>
>>> On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy <
 lukasz.gaj...@gmail.com> wrote:

> Hi all,
>
> I'd like to announce that thanks to Kamil Szewczyk, since this PR
>  we have 4 file-based
> HDFS tests run on a "Large HDFS Cluster"! More specifically I mean:
>
> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
> - beam_PerformanceTests_AvroIOIT_HDFS
> - beam_PerformanceTests_XmlIOIT_HDFS
>
> The "Large HDFS Cluster" (in contrast to the small one, that is
> also available) consists of a master node and three data nodes all in
> separate pods. Thanks to that we can mimic more real-life scenarios 
> on HDFS
> (3 distributed nodes) and possibly run bigger tests so there's 
> progress! :)
>
>
 This is great. Also, looks like results are available in test
 dashboard: https://apache-beam-testing.appspot.com/
 explore?dashboard=5755685136498688
 (BTW we should add information about dashboard to the testing doc:
 https://beam.apache.org/contribute/testing/)

 I'm currently working on proper documentation for this so that
> everyone can use it in IOITs (stay tuned).
>
> Regarding the above, I'd like to propose scaling up the
> Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we 
> scale it
> up to eg. 3 nodes, the HDFS' kubernetes pods will distribute 
> themselves on
> different machines rather than one, making it an even more "real-life"
> scenario (possibly more efficient?). Moreover, other Performance Tests
> (such as JDBC or mongo) could use more space for their infrastructure 
> as
> well. Scaling up the cluster could also turn out useful for some 
> future
> efforts, like BEAM-4508[1] (adapting and running some old IOITs
> on Jenkins).
>
> WDYT? Are there any objections?
>
 +1 for increasing the size of Kubernetes cluster.

>
> [1] https://issues.apache.org/jira/browse/BEAM-4508
>
> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>>
> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>>


Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-08 Thread Łukasz Gajowy
@Pablo this is exactly as Chamikara says. In fact, there is a dedicated
Gcloud project for whole testing infrastructure (called
"apache-beam-testing"). It provides the Kubernetes cluster for the data
stores as well as big query storage for the test results presented in the
testing dashboard.

@Alan thanks a lot!

Best regards,
Łukasz



czw., 7 cze 2018 o 22:37 Chamikara Jayalath 
napisał(a):

> We still use Jenkins machines to execute the test but data stores are
> hosted in Kubernetes.
>
> On Thu, Jun 7, 2018 at 1:35 PM Pablo Estrada  wrote:
>
>> Just out of curiosity: This does not use the Jenkins machines then?
>> -P.
>>
>> On Thu, Jun 7, 2018 at 1:33 PM Alan Myrvold  wrote:
>>
>>> Done. Changed the size of the io-datastores kubernetes cluster in
>>> apache-beam-testing to 3 nodes.
>>>
>>> On Thu, Jun 7, 2018 at 1:45 AM Kamil Szewczyk 
>>> wrote:
>>>
 Hi,

 the node pool size of io-datastores kubernetes cluster in
 apache-beam-testing project must be changed from 1 -> 3 (or other value).
 @Alan Myrvold was already helpful with kubernetes cluster settings so
 far, but I am not aware who made decisions regarding that as
 this will increase monthly billing.

 Kamil Szewczyk

 2018-06-07 6:27 GMT+02:00 Kenneth Knowles :

> This is rad. Another +1 from me for a bigger cluster. What do you need
> to make that happen?
>
> Kenn
>
> On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada 
> wrote:
>
>> This is really cool!
>>
>> +1 for having a cluster with more than one machine run the test.
>>
>> -P.
>>
>> On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy <
>>> lukasz.gaj...@gmail.com> wrote:
>>>
 Hi all,

 I'd like to announce that thanks to Kamil Szewczyk, since this PR
  we have 4 file-based
 HDFS tests run on a "Large HDFS Cluster"! More specifically I mean:

 - beam_PerformanceTests_Compressed_TextIOIT_HDFS
 - beam_PerformanceTests_Compressed_TextIOIT_HDFS
 - beam_PerformanceTests_AvroIOIT_HDFS
 - beam_PerformanceTests_XmlIOIT_HDFS

 The "Large HDFS Cluster" (in contrast to the small one, that is
 also available) consists of a master node and three data nodes all in
 separate pods. Thanks to that we can mimic more real-life scenarios on 
 HDFS
 (3 distributed nodes) and possibly run bigger tests so there's 
 progress! :)


>>> This is great. Also, looks like results are available in test
>>> dashboard:
>>> https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
>>> (BTW we should add information about dashboard to the testing doc:
>>> https://beam.apache.org/contribute/testing/)
>>>
>>> I'm currently working on proper documentation for this so that
 everyone can use it in IOITs (stay tuned).

 Regarding the above, I'd like to propose scaling up the
 Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we 
 scale it
 up to eg. 3 nodes, the HDFS' kubernetes pods will distribute 
 themselves on
 different machines rather than one, making it an even more "real-life"
 scenario (possibly more efficient?). Moreover, other Performance Tests
 (such as JDBC or mongo) could use more space for their infrastructure 
 as
 well. Scaling up the cluster could also turn out useful for some future
 efforts, like BEAM-4508[1] (adapting and running some old IOITs on
 Jenkins).

 WDYT? Are there any objections?

>>> +1 for increasing the size of Kubernetes cluster.
>>>

 [1] https://issues.apache.org/jira/browse/BEAM-4508

 --
>> Got feedback? go/pabloem-feedback
>> 
>>
>
 --
>> Got feedback? go/pabloem-feedback
>> 
>>
>


Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-07 Thread Chamikara Jayalath
We still use Jenkins machines to execute the test but data stores are
hosted in Kubernetes.

On Thu, Jun 7, 2018 at 1:35 PM Pablo Estrada  wrote:

> Just out of curiosity: This does not use the Jenkins machines then?
> -P.
>
> On Thu, Jun 7, 2018 at 1:33 PM Alan Myrvold  wrote:
>
>> Done. Changed the size of the io-datastores kubernetes cluster in
>> apache-beam-testing to 3 nodes.
>>
>> On Thu, Jun 7, 2018 at 1:45 AM Kamil Szewczyk  wrote:
>>
>>> Hi,
>>>
>>> the node pool size of io-datastores kubernetes cluster in
>>> apache-beam-testing project must be changed from 1 -> 3 (or other value).
>>> @Alan Myrvold was already helpful with kubernetes cluster settings so
>>> far, but I am not aware who made decisions regarding that as
>>> this will increase monthly billing.
>>>
>>> Kamil Szewczyk
>>>
>>> 2018-06-07 6:27 GMT+02:00 Kenneth Knowles :
>>>
 This is rad. Another +1 from me for a bigger cluster. What do you need
 to make that happen?

 Kenn

 On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada 
 wrote:

> This is really cool!
>
> +1 for having a cluster with more than one machine run the test.
>
> -P.
>
> On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to announce that thanks to Kamil Szewczyk, since this PR
>>>  we have 4 file-based
>>> HDFS tests run on a "Large HDFS Cluster"! More specifically I mean:
>>>
>>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>>> - beam_PerformanceTests_AvroIOIT_HDFS
>>> - beam_PerformanceTests_XmlIOIT_HDFS
>>>
>>> The "Large HDFS Cluster" (in contrast to the small one, that is also
>>> available) consists of a master node and three data nodes all in 
>>> separate
>>> pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
>>> distributed nodes) and possibly run bigger tests so there's progress! :)
>>>
>>>
>> This is great. Also, looks like results are available in test
>> dashboard:
>> https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
>> (BTW we should add information about dashboard to the testing doc:
>> https://beam.apache.org/contribute/testing/)
>>
>> I'm currently working on proper documentation for this so that
>>> everyone can use it in IOITs (stay tuned).
>>>
>>> Regarding the above, I'd like to propose scaling up the
>>> Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we 
>>> scale it
>>> up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves 
>>> on
>>> different machines rather than one, making it an even more "real-life"
>>> scenario (possibly more efficient?). Moreover, other Performance Tests
>>> (such as JDBC or mongo) could use more space for their infrastructure as
>>> well. Scaling up the cluster could also turn out useful for some future
>>> efforts, like BEAM-4508[1] (adapting and running some old IOITs on
>>> Jenkins).
>>>
>>> WDYT? Are there any objections?
>>>
>> +1 for increasing the size of Kubernetes cluster.
>>
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-4508
>>>
>>> --
> Got feedback? go/pabloem-feedback
> 
>

>>> --
> Got feedback? go/pabloem-feedback
> 
>


Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-07 Thread Pablo Estrada
Just out of curiosity: This does not use the Jenkins machines then?
-P.

On Thu, Jun 7, 2018 at 1:33 PM Alan Myrvold  wrote:

> Done. Changed the size of the io-datastores kubernetes cluster in
> apache-beam-testing to 3 nodes.
>
> On Thu, Jun 7, 2018 at 1:45 AM Kamil Szewczyk  wrote:
>
>> Hi,
>>
>> the node pool size of io-datastores kubernetes cluster in
>> apache-beam-testing project must be changed from 1 -> 3 (or other value).
>> @Alan Myrvold was already helpful with kubernetes cluster settings so
>> far, but I am not aware who made decisions regarding that as
>> this will increase monthly billing.
>>
>> Kamil Szewczyk
>>
>> 2018-06-07 6:27 GMT+02:00 Kenneth Knowles :
>>
>>> This is rad. Another +1 from me for a bigger cluster. What do you need
>>> to make that happen?
>>>
>>> Kenn
>>>
>>> On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada 
>>> wrote:
>>>
 This is really cool!

 +1 for having a cluster with more than one machine run the test.

 -P.

 On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath 
 wrote:

> On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy 
> wrote:
>
>> Hi all,
>>
>> I'd like to announce that thanks to Kamil Szewczyk, since this PR
>>  we have 4 file-based HDFS
>> tests run on a "Large HDFS Cluster"! More specifically I mean:
>>
>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>> - beam_PerformanceTests_AvroIOIT_HDFS
>> - beam_PerformanceTests_XmlIOIT_HDFS
>>
>> The "Large HDFS Cluster" (in contrast to the small one, that is also
>> available) consists of a master node and three data nodes all in separate
>> pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
>> distributed nodes) and possibly run bigger tests so there's progress! :)
>>
>>
> This is great. Also, looks like results are available in test
> dashboard:
> https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
> (BTW we should add information about dashboard to the testing doc:
> https://beam.apache.org/contribute/testing/)
>
> I'm currently working on proper documentation for this so that
>> everyone can use it in IOITs (stay tuned).
>>
>> Regarding the above, I'd like to propose scaling up the
>> Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale 
>> it
>> up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves 
>> on
>> different machines rather than one, making it an even more "real-life"
>> scenario (possibly more efficient?). Moreover, other Performance Tests
>> (such as JDBC or mongo) could use more space for their infrastructure as
>> well. Scaling up the cluster could also turn out useful for some future
>> efforts, like BEAM-4508[1] (adapting and running some old IOITs on
>> Jenkins).
>>
>> WDYT? Are there any objections?
>>
> +1 for increasing the size of Kubernetes cluster.
>
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-4508
>>
>> --
 Got feedback? go/pabloem-feedback
 

>>>
>> --
Got feedback? go/pabloem-feedback


Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-07 Thread Alan Myrvold
Done. Changed the size of the io-datastores kubernetes cluster in
apache-beam-testing to 3 nodes.

On Thu, Jun 7, 2018 at 1:45 AM Kamil Szewczyk  wrote:

> Hi,
>
> the node pool size of io-datastores kubernetes cluster in
> apache-beam-testing project must be changed from 1 -> 3 (or other value).
> @Alan Myrvold was already helpful with kubernetes cluster settings so
> far, but I am not aware who made decisions regarding that as
> this will increase monthly billing.
>
> Kamil Szewczyk
>
> 2018-06-07 6:27 GMT+02:00 Kenneth Knowles :
>
>> This is rad. Another +1 from me for a bigger cluster. What do you need to
>> make that happen?
>>
>> Kenn
>>
>> On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada  wrote:
>>
>>> This is really cool!
>>>
>>> +1 for having a cluster with more than one machine run the test.
>>>
>>> -P.
>>>
>>> On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath 
>>> wrote:
>>>
 On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy 
 wrote:

> Hi all,
>
> I'd like to announce that thanks to Kamil Szewczyk, since this PR
>  we have 4 file-based HDFS
> tests run on a "Large HDFS Cluster"! More specifically I mean:
>
> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
> - beam_PerformanceTests_AvroIOIT_HDFS
> - beam_PerformanceTests_XmlIOIT_HDFS
>
> The "Large HDFS Cluster" (in contrast to the small one, that is also
> available) consists of a master node and three data nodes all in separate
> pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
> distributed nodes) and possibly run bigger tests so there's progress! :)
>
>
 This is great. Also, looks like results are available in test
 dashboard:
 https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
 (BTW we should add information about dashboard to the testing doc:
 https://beam.apache.org/contribute/testing/)

 I'm currently working on proper documentation for this so that everyone
> can use it in IOITs (stay tuned).
>
> Regarding the above, I'd like to propose scaling up the
> Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale 
> it
> up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves on
> different machines rather than one, making it an even more "real-life"
> scenario (possibly more efficient?). Moreover, other Performance Tests
> (such as JDBC or mongo) could use more space for their infrastructure as
> well. Scaling up the cluster could also turn out useful for some future
> efforts, like BEAM-4508[1] (adapting and running some old IOITs on
> Jenkins).
>
> WDYT? Are there any objections?
>
 +1 for increasing the size of Kubernetes cluster.

>
> [1] https://issues.apache.org/jira/browse/BEAM-4508
>
> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>>
>


Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-07 Thread Kamil Szewczyk
Hi,

the node pool size of io-datastores kubernetes cluster in
apache-beam-testing project must be changed from 1 -> 3 (or other value).
@Alan Myrvold was already helpful with kubernetes cluster settings so far,
but I am not aware who made decisions regarding that as
this will increase monthly billing.

Kamil Szewczyk

2018-06-07 6:27 GMT+02:00 Kenneth Knowles :

> This is rad. Another +1 from me for a bigger cluster. What do you need to
> make that happen?
>
> Kenn
>
> On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada  wrote:
>
>> This is really cool!
>>
>> +1 for having a cluster with more than one machine run the test.
>>
>> -P.
>>
>> On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath 
>> wrote:
>>
>>> On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy 
>>> wrote:
>>>
 Hi all,

 I'd like to announce that thanks to Kamil Szewczyk, since this PR
  we have 4 file-based HDFS
 tests run on a "Large HDFS Cluster"! More specifically I mean:

 - beam_PerformanceTests_Compressed_TextIOIT_HDFS
 - beam_PerformanceTests_Compressed_TextIOIT_HDFS
 - beam_PerformanceTests_AvroIOIT_HDFS
 - beam_PerformanceTests_XmlIOIT_HDFS

 The "Large HDFS Cluster" (in contrast to the small one, that is also
 available) consists of a master node and three data nodes all in separate
 pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
 distributed nodes) and possibly run bigger tests so there's progress! :)


>>> This is great. Also, looks like results are available in test dashboard:
>>> https://apache-beam-testing.appspot.com/explore?dashboard=
>>> 5755685136498688
>>> (BTW we should add information about dashboard to the testing doc:
>>> https://beam.apache.org/contribute/testing/)
>>>
>>> I'm currently working on proper documentation for this so that everyone
 can use it in IOITs (stay tuned).

 Regarding the above, I'd like to propose scaling up the
 Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale it
 up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves on
 different machines rather than one, making it an even more "real-life"
 scenario (possibly more efficient?). Moreover, other Performance Tests
 (such as JDBC or mongo) could use more space for their infrastructure as
 well. Scaling up the cluster could also turn out useful for some future
 efforts, like BEAM-4508[1] (adapting and running some old IOITs on
 Jenkins).

 WDYT? Are there any objections?

>>> +1 for increasing the size of Kubernetes cluster.
>>>

 [1] https://issues.apache.org/jira/browse/BEAM-4508

 --
>> Got feedback? go/pabloem-feedback
>> 
>>
>


Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-06 Thread Kenneth Knowles
This is rad. Another +1 from me for a bigger cluster. What do you need to
make that happen?

Kenn

On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada  wrote:

> This is really cool!
>
> +1 for having a cluster with more than one machine run the test.
>
> -P.
>
> On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath 
> wrote:
>
>> On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to announce that thanks to Kamil Szewczyk, since this PR
>>>  we have 4 file-based HDFS
>>> tests run on a "Large HDFS Cluster"! More specifically I mean:
>>>
>>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>>> - beam_PerformanceTests_AvroIOIT_HDFS
>>> - beam_PerformanceTests_XmlIOIT_HDFS
>>>
>>> The "Large HDFS Cluster" (in contrast to the small one, that is also
>>> available) consists of a master node and three data nodes all in separate
>>> pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
>>> distributed nodes) and possibly run bigger tests so there's progress! :)
>>>
>>>
>> This is great. Also, looks like results are available in test dashboard:
>> https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
>> (BTW we should add information about dashboard to the testing doc:
>> https://beam.apache.org/contribute/testing/)
>>
>> I'm currently working on proper documentation for this so that everyone
>>> can use it in IOITs (stay tuned).
>>>
>>> Regarding the above, I'd like to propose scaling up the
>>> Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale it
>>> up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves on
>>> different machines rather than one, making it an even more "real-life"
>>> scenario (possibly more efficient?). Moreover, other Performance Tests
>>> (such as JDBC or mongo) could use more space for their infrastructure as
>>> well. Scaling up the cluster could also turn out useful for some future
>>> efforts, like BEAM-4508[1] (adapting and running some old IOITs on
>>> Jenkins).
>>>
>>> WDYT? Are there any objections?
>>>
>> +1 for increasing the size of Kubernetes cluster.
>>
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-4508
>>>
>>> --
> Got feedback? go/pabloem-feedback
> 
>


Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-06 Thread Pablo Estrada
This is really cool!

+1 for having a cluster with more than one machine run the test.

-P.

On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath 
wrote:

> On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy 
> wrote:
>
>> Hi all,
>>
>> I'd like to announce that thanks to Kamil Szewczyk, since this PR
>>  we have 4 file-based HDFS
>> tests run on a "Large HDFS Cluster"! More specifically I mean:
>>
>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>> - beam_PerformanceTests_AvroIOIT_HDFS
>> - beam_PerformanceTests_XmlIOIT_HDFS
>>
>> The "Large HDFS Cluster" (in contrast to the small one, that is also
>> available) consists of a master node and three data nodes all in separate
>> pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
>> distributed nodes) and possibly run bigger tests so there's progress! :)
>>
>>
> This is great. Also, looks like results are available in test dashboard:
> https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
> (BTW we should add information about dashboard to the testing doc:
> https://beam.apache.org/contribute/testing/)
>
> I'm currently working on proper documentation for this so that everyone
>> can use it in IOITs (stay tuned).
>>
>> Regarding the above, I'd like to propose scaling up the
>> Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale it
>> up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves on
>> different machines rather than one, making it an even more "real-life"
>> scenario (possibly more efficient?). Moreover, other Performance Tests
>> (such as JDBC or mongo) could use more space for their infrastructure as
>> well. Scaling up the cluster could also turn out useful for some future
>> efforts, like BEAM-4508[1] (adapting and running some old IOITs on
>> Jenkins).
>>
>> WDYT? Are there any objections?
>>
> +1 for increasing the size of Kubernetes cluster.
>
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-4508
>>
>> --
Got feedback? go/pabloem-feedback


Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-06 Thread Chamikara Jayalath
On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy 
wrote:

> Hi all,
>
> I'd like to announce that thanks to Kamil Szewczyk, since this PR
>  we have 4 file-based HDFS
> tests run on a "Large HDFS Cluster"! More specifically I mean:
>
> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
> - beam_PerformanceTests_AvroIOIT_HDFS
> - beam_PerformanceTests_XmlIOIT_HDFS
>
> The "Large HDFS Cluster" (in contrast to the small one, that is also
> available) consists of a master node and three data nodes all in separate
> pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
> distributed nodes) and possibly run bigger tests so there's progress! :)
>
>
This is great. Also, looks like results are available in test dashboard:
https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
(BTW we should add information about dashboard to the testing doc:
https://beam.apache.org/contribute/testing/)

I'm currently working on proper documentation for this so that everyone can
> use it in IOITs (stay tuned).
>
> Regarding the above, I'd like to propose scaling up the
> Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale it
> up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves on
> different machines rather than one, making it an even more "real-life"
> scenario (possibly more efficient?). Moreover, other Performance Tests
> (such as JDBC or mongo) could use more space for their infrastructure as
> well. Scaling up the cluster could also turn out useful for some future
> efforts, like BEAM-4508[1] (adapting and running some old IOITs on
> Jenkins).
>
> WDYT? Are there any objections?
>
+1 for increasing the size of Kubernetes cluster.

>
> [1] https://issues.apache.org/jira/browse/BEAM-4508
>
>