lostluck edited a comment on pull request #13436:
URL: https://github.com/apache/beam/pull/13436#issuecomment-740863368
WRT the investigation we've been doing around a flink failure for large
settings:
In my own investigation into the problem on the google internal runner, the
"big" configuration ends up being too large for protocol buffer serialization,
with a single ~10GB StateResponse, causing the issue. Protos have a hard cap of
2GB in serialized size.
From #beam-go slack discussion, and other research, java and python also
fails with large configurations, as it's not yet implemented anywhere to page
through large side inputs.
`--input_options='{"num_records": 2000000, "key_size":100,
"value_size":900}' --access_percentage=1`
and
`--input_options='{"num_records": 10000000, "key_size":100,
"value_size":180}' --access_percentage=1`
work by cutting data total down to 1/5th.
This will affect all portable runners, but not any of the legacy ones,
because they don't pass data around through the protos. A JIRA is being filed
about it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]