The Direct Runner as currently implemented is purposely inefficient. It was designed for testing, and therefore does many things that are meant to expose bugs in user pipelines (e.g. randomly sorting PCollections, serializing/deserializing every element, etc.). So it's not surprising that it doesn't behave well under load tests.
Reuven On Thu, Jan 10, 2019 at 5:55 AM Katarzyna Kucharczyk < [email protected]> wrote: > Hi Everyone, > > My name is Kasia and I contribute to Beam's tests. Currently, I am working > with Łukasz Gajowy on load tests and we created Jenkins configuration to > run Synthetic Sources test on DirectRunner. It was decided to generate 1 > 000 000 000 records (bytes) for a small suite (details you can find in this > proposal [1] ). Running this test on the Beam’s Jenkins is causing runtime > exception with the message: > "java.lang.OutOfMemoryError: GC overhead limit exceeded". > > Of course, this is not a surprise since it's a lot of data. That's why I > am asking for your advice/opinion: > Do you think if this test should be smaller? On the other hand, if it's > going to be smaller would it be still worth testing as a load test? > Maybe it would be better to wait for the UniversalLocalRunner instead and > use it while it's there? What is the status of ULR? Do you know if the ULR > will replace DirectRunner? > > I created an issue [2] with details of this problem where you can find the > link to the example of a failing job. > > Thanks, > Kasia > > [1] https://s.apache.org/load-test-basic-operations > <https://s.apache.org/load-test-basic-operations> > [2] https://issues.apache.org/jira/browse/BEAM-6351 >
