Hi all,

I'd like to announce that thanks to Kamil Szewczyk, since this PR
<https://github.com/apache/beam/pull/5441> we have 4 file-based HDFS tests
run on a "Large HDFS Cluster"! More specifically I mean:

- beam_PerformanceTests_Compressed_TextIOIT_HDFS
- beam_PerformanceTests_Compressed_TextIOIT_HDFS
- beam_PerformanceTests_AvroIOIT_HDFS
- beam_PerformanceTests_XmlIOIT_HDFS

The "Large HDFS Cluster" (in contrast to the small one, that is also
available) consists of a master node and three data nodes all in separate
pods. Thanks to that we can mimic more real-life scenarios on HDFS (3
distributed nodes) and possibly run bigger tests so there's progress! :)

I'm currently working on proper documentation for this so that everyone can
use it in IOITs (stay tuned).

Regarding the above, I'd like to propose scaling up the Kubernetes cluster.
AFAIK, currently, it consists of 1 node. If we scale it up to eg. 3 nodes,
the HDFS' kubernetes pods will distribute themselves on different machines
rather than one, making it an even more "real-life" scenario (possibly more
efficient?). Moreover, other Performance Tests (such as JDBC or mongo)
could use more space for their infrastructure as well. Scaling up the
cluster could also turn out useful for some future efforts, like BEAM-4508[1]
(adapting and running some old IOITs on Jenkins).

WDYT? Are there any objections?

[1] https://issues.apache.org/jira/browse/BEAM-4508

Reply via email to