Re: Announcement & Proposal: HDFS tests on large cluster.

Kamil Szewczyk Thu, 07 Jun 2018 01:46:05 -0700

Hi,

the node pool size of io-datastores kubernetes cluster in
apache-beam-testing project must be changed from 1 -> 3 (or other value).
@Alan Myrvold was already helpful with kubernetes cluster settings so far,
but I am not aware who made decisions regarding that as
this will increase monthly billing.


Kamil Szewczyk

2018-06-07 6:27 GMT+02:00 Kenneth Knowles <[email protected]>:

> This is rad. Another +1 from me for a bigger cluster. What do you need to
> make that happen?
>
> Kenn
>
> On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada <[email protected]> wrote:
>
>> This is really cool!
>>
>> +1 for having a cluster with more than one machine run the test.
>>
>> -P.
>>
>> On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath <[email protected]>
>> wrote:
>>
>>> On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'd like to announce that thanks to Kamil Szewczyk, since this PR
>>>> <https://github.com/apache/beam/pull/5441> we have 4 file-based HDFS
>>>> tests run on a "Large HDFS Cluster"! More specifically I mean:
>>>>
>>>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>>>> - beam_PerformanceTests_Compressed_TextIOIT_HDFS
>>>> - beam_PerformanceTests_AvroIOIT_HDFS
>>>> - beam_PerformanceTests_XmlIOIT_HDFS
>>>>
>>>> The "Large HDFS Cluster" (in contrast to the small one, that is also
>>>> available) consists of a master node and three data nodes all in separate
>>>> pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
>>>> distributed nodes) and possibly run bigger tests so there's progress! :)
>>>>
>>>>
>>> This is great. Also, looks like results are available in test dashboard:
>>> https://apache-beam-testing.appspot.com/explore?dashboard=
>>> 5755685136498688
>>> (BTW we should add information about dashboard to the testing doc:
>>> https://beam.apache.org/contribute/testing/)
>>>
>>> I'm currently working on proper documentation for this so that everyone
>>>> can use it in IOITs (stay tuned).
>>>>
>>>> Regarding the above, I'd like to propose scaling up the
>>>> Kubernetes cluster. AFAIK, currently, it consists of 1 node. If we scale it
>>>> up to eg. 3 nodes, the HDFS' kubernetes pods will distribute themselves on
>>>> different machines rather than one, making it an even more "real-life"
>>>> scenario (possibly more efficient?). Moreover, other Performance Tests
>>>> (such as JDBC or mongo) could use more space for their infrastructure as
>>>> well. Scaling up the cluster could also turn out useful for some future
>>>> efforts, like BEAM-4508[1] (adapting and running some old IOITs on
>>>> Jenkins).
>>>>
>>>> WDYT? Are there any objections?
>>>>
>>> +1 for increasing the size of Kubernetes cluster.
>>>
>>>>
>>>> [1] https://issues.apache.org/jira/browse/BEAM-4508
>>>>
>>>> --
>> Got feedback? go/pabloem-feedback
>> <https://goto.google.com/pabloem-feedback>
>>
>

Re: Announcement & Proposal: HDFS tests on large cluster.

Reply via email to