Re: Beam BigtableIO versus Google CloudBigtableIO

Sahith Nallapareddy via dev Tue, 16 Aug 2022 10:47:12 -0700

Hello Diego,

Right now we are using BigtableIO so I will continue to use that one!

For the second part, Ill explain a bit more what we saw as I simplified a
bit in my original email. At some point we had two streaming pipelines
writing to bigtable and we decided to combine these into one pipeline that
writes to multiple Bigtables. What we found is that our network traffic to
bigtable did go up by a bit more than 3x than when the pipelines separated.
Our nodes were about the same now looking back I think I misremembered that
part. We opened a google ticket at the time to see what we could do to
remedy this as we didnt expect that much of a cost increase and they told
us that this was due to the new implementation batching less mutations
(causing more write requests) than the old. We were advised to mess with
the bulk options, but we did not really get a chance to yet so I will try
that at some point. I was wondering if anyone could shed light if that is
the best way to configure how much bigtable batches requests or is there
more that could be done.

Thanks,

Sahith

On Tue, Aug 16, 2022 at 1:04 PM Diego Gomez <diego...@google.com> wrote:

> Hello Sahith,
>
> We recommend using BigtableIO over CloudBigtableIO. Both of them have
> similar performances and main differences being than CloudBigtableIO uses
> HBase Result and Puts, while BigtableIO uses protos to read results and
> mutations.
>
> The two connectors should result in similar spending on Bigtable's side,
> more write requests doesn't necessarily mean more cost/nodes. What version
> of CloudBigtableIO are you using and are you using an autoscaling CBT
> cluster?
>
> -Diego
>
> On Tue, Aug 16, 2022 at 11:55 AM Sahith Nallapareddy via dev <
> dev@beam.apache.org> wrote:
>
>> Hello,
>>
>> I see that there are two implementations of reading and writing from
>> Bigtable, one in beam and one that is references in Google cloud
>> documentation. Is one preferred over the other? We often use the Beam
>> BigtableIO to write to bigtable but I have found that sometimes the default
>> configuration can lead to a lot of write requests (which can lead to having
>> more nodes as well it seems, more cost associated). I am about to try
>> messing around with the bulk options to see if that can raise the batching
>> of mutations, but is there anything else I should try, like switching the
>> actual transform we use?
>>
>> Thanks,
>>
>> Sahith
>>
>

Re: Beam BigtableIO versus Google CloudBigtableIO

Reply via email to