Hi,

I want to suggest the change[1] of the thread pool type in BigQuery
streaming insert for Java SDK (BEAM-6443). When we insert small data into
BigQuery very fast by using BigQueryIO.write, it generates lots of rate
limit exceeded errors in a log file. It's mainly because the number of
threads to be used for the inserting job is just too large (50 shards *
dozens of futures executed by unlimited thread pool per each bundle). I've
conducted some benchmarks[2] and could see that the change from unlimited
thread pool to single thread pool reduces the number of (same repeated,
possibly meaningless) error messages by 1/4 while retaining the same
performance. I think that this change will not break any important
performance measure but if anybody has any concerns about this change
please let me know.

Thanks,

[1] https://github.com/apache/beam/pull/7547
[2]
https://docs.google.com/document/d/1EhRNWLevm86GD_QtvlrTauHITVMwQBzuemyp-w4Z_ck/edit#heading=h.c0angyd9tn21

Reply via email to