[ https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340124#comment-16340124 ]
Solomon Duskis commented on BEAM-3311: -------------------------------------- I spoke quite a bit with the Beam team about this. BigtableIO should remain as is. It looks like there's a _Flatten.iterables()_ which ought to convert an _Iterable<T>_ to a _T_. The BigtableIO connector is meant to satisfy 80%+ of the use cases. In other cases, I generally look for common usage patterns before a change is made to any connector. In addition to this approach, you can also create your own DoFn that does arbitrary operations against a [BigtableSession|https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-client-core-parent/bigtable-client-core/src/main/java/com/google/cloud/bigtable/grpc/BigtableSession.java]. Be sure to use _BigtableOptions.Builder.setUseCachedDataPool(true)_, if you chose to go down this route. > Extend BigTableIO to write Iterable of KV > ------------------------------------------ > > Key: BEAM-3311 > URL: https://issues.apache.org/jira/browse/BEAM-3311 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp > Affects Versions: 2.2.0 > Reporter: Anna Smith > Assignee: Solomon Duskis > Priority: Major > > The motivation is to achieve qps as advertised in BigTable in Dataflow > streaming mode (ex: 300k qps for 30 node cluster). Currently we aren't > seeing this as the bundle size is small in streaming mode and the requests > are overwhelmed by AuthentiationHeader. For example, in order to achieve qps > advertised each payload is recommended to be ~1KB but without batching each > payload is 7KB, the majority of which is the authentication header. > Currently BigTableIO supports DoFn<KV<ByteString, Iterable<Mutation>>,...> > where batching is done per Bundle on flush in finishBundle. We would like to > be able to manually batch using a DoFn<Iterable<KV<ByteString, > Iterable<Mutation>>>,...> so we can get around the small Bundle size in > streaming. We have seen some improvements in qps to BigTable when running > with Dataflow using this approach. > Initial thoughts on implementation would be to extend Write in order to have > a BulkWrite of Iterable<KV<ByteString, Iterable<Mutation>>>. -- This message was sent by Atlassian JIRA (v7.6.3#76005)