[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587393#comment-16587393 ] Solomon Duskis commented on BEAM-3342: -- The Cloud Bigtable client is just about ready with full functionality. It did indeed take longer that we were expected. Once we do that, there's a likelihood that a Python write connector will significantly underperform compared to Java, since the Python client only performs synchronous operations, where the Java has a high throughput asynchronous writer. Also, in terms of reading from Cloud Bigtable, any connector needs full support for a BoundedSource, or something like it. We could not figure out how to make BoundedSource work in Python. > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-3246) BigtableIO should merge splits if they exceed 15K
[ https://issues.apache.org/jira/browse/BEAM-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Solomon Duskis resolved BEAM-3246. -- Resolution: Fixed Fix Version/s: 2.5.0 This issue was fixed with [this commit|https://github.com/apache/beam/commit/7dbcb11ff1cb9f2b5f0ffdb63bb38b686fdb0c71]. > BigtableIO should merge splits if they exceed 15K > - > > Key: BEAM-3246 > URL: https://issues.apache.org/jira/browse/BEAM-3246 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > Fix For: 2.5.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > A customer hit a problem with a large number of splits. CloudBitableIO fixes > that here > https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-hbase-beam/src/main/java/com/google/cloud/bigtable/beam/CloudBigtableIO.java#L241 > BigtableIO should have similar logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4564) Update Bigtable dependencies
Solomon Duskis created BEAM-4564: Summary: Update Bigtable dependencies Key: BEAM-4564 URL: https://issues.apache.org/jira/browse/BEAM-4564 Project: Beam Issue Type: Improvement Components: io-java-gcp Affects Versions: 2.5.0 Reporter: Solomon Duskis Assignee: Solomon Duskis Cloud Bigtable's dependencies should be updated: Here are the current versions: * bigtable.version: 1.0.0 * bigtable.proto.version: 1.0.0-pre3 The new bigtable.version is 1.4.0. The Bigtable protos dependency needs to change to the 0.15.0 version of com.google.api.grpc:proto-google-cloud-bigtable-v2 and com.google.api.grpc:proto-google-cloud-bigtable-admin-v2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector
[ https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458848#comment-16458848 ] Solomon Duskis commented on BEAM-2955: -- [~iemejia], what's going on with HBaseIO these days? Is it safe to start work on this? > Create a Cloud Bigtable HBase connector > --- > > Key: BEAM-2955 > URL: https://issues.apache.org/jira/browse/BEAM-2955 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > > The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a > different repo for awhile. Recently, we did some reworking of the Cloud > Bigtable client that would allow it to better coexist in the Beam ecosystem, > and we also released a Beam connector in our repository that exposes HBase > idioms rather than the Protobuf idioms of BigtableIO. More information about > the customer experience of the HBase connector can be found here: > [https://cloud.google.com/bigtable/docs/dataflow-hbase]. > The Beam repo is a much better place to house a Cloud Bigtable HBase > connector. There are a couple of ways we can implement this new connector: > # The CBT connector depends on artifacts in the io/hbase maven project. We > can create a new extend HBaseIO for the purposes of CBT. We would have to > add some features to HBaseIO to make that work (dynamic rebalancing, and a > way for HBase and CBT's size estimation models to coexist) > # The BigtableIO connector works well, and we can add an adapter layer on top > of it. I have a proof of concept of it here: > [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam]. > # We can build a separate CBT HBase connector. > I'm happy to do the work. I would appreciate some guidance and discussion > about the right approach. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3311) Extend BigTableIO to write Iterable of KV
[ https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Solomon Duskis closed BEAM-3311. Resolution: Won't Fix Fix Version/s: Not applicable Use Flatten.iterable() instead of duplicating that functionality in BigtableIO. > Extend BigTableIO to write Iterable of KV > -- > > Key: BEAM-3311 > URL: https://issues.apache.org/jira/browse/BEAM-3311 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Affects Versions: 2.2.0 >Reporter: Anna Smith >Assignee: Solomon Duskis >Priority: Major > Fix For: Not applicable > > > The motivation is to achieve qps as advertised in BigTable in Dataflow > streaming mode (ex: 300k qps for 30 node cluster). Currently we aren't > seeing this as the bundle size is small in streaming mode and the requests > are overwhelmed by AuthentiationHeader. For example, in order to achieve qps > advertised each payload is recommended to be ~1KB but without batching each > payload is 7KB, the majority of which is the authentication header. > Currently BigTableIO supports DoFn>,...> > where batching is done per Bundle on flush in finishBundle. We would like to > be able to manually batch using a DoFn Iterable>>,...> so we can get around the small Bundle size in > streaming. We have seen some improvements in qps to BigTable when running > with Dataflow using this approach. > Initial thoughts on implementation would be to extend Write in order to have > a BulkWrite of Iterable>>. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3311) Extend BigTableIO to write Iterable of KV
[ https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340124#comment-16340124 ] Solomon Duskis commented on BEAM-3311: -- I spoke quite a bit with the Beam team about this. BigtableIO should remain as is. It looks like there's a _Flatten.iterables()_ which ought to convert an _Iterable_ to a _T_. The BigtableIO connector is meant to satisfy 80%+ of the use cases. In other cases, I generally look for common usage patterns before a change is made to any connector. In addition to this approach, you can also create your own DoFn that does arbitrary operations against a [BigtableSession|https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-client-core-parent/bigtable-client-core/src/main/java/com/google/cloud/bigtable/grpc/BigtableSession.java]. Be sure to use _BigtableOptions.Builder.setUseCachedDataPool(true)_, if you chose to go down this route. > Extend BigTableIO to write Iterable of KV > -- > > Key: BEAM-3311 > URL: https://issues.apache.org/jira/browse/BEAM-3311 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Affects Versions: 2.2.0 >Reporter: Anna Smith >Assignee: Solomon Duskis >Priority: Major > > The motivation is to achieve qps as advertised in BigTable in Dataflow > streaming mode (ex: 300k qps for 30 node cluster). Currently we aren't > seeing this as the bundle size is small in streaming mode and the requests > are overwhelmed by AuthentiationHeader. For example, in order to achieve qps > advertised each payload is recommended to be ~1KB but without batching each > payload is 7KB, the majority of which is the authentication header. > Currently BigTableIO supports DoFn>,...> > where batching is done per Bundle on flush in finishBundle. We would like to > be able to manually batch using a DoFn Iterable>>,...> so we can get around the small Bundle size in > streaming. We have seen some improvements in qps to BigTable when running > with Dataflow using this approach. > Initial thoughts on implementation would be to extend Write in order to have > a BulkWrite of Iterable>>. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336429#comment-16336429 ] Solomon Duskis commented on BEAM-3342: -- It turns out that we have quite a bit of work to do on the core Cloud Bigtable python client in order to make an effective Beam connector. It could be a while before the client is ready. > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3098) Upgrade Java grpc version
[ https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335013#comment-16335013 ] Solomon Duskis commented on BEAM-3098: -- Ping. > Upgrade Java grpc version > - > > Key: BEAM-3098 > URL: https://issues.apache.org/jira/browse/BEAM-3098 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Solomon Duskis >Priority: Major > > Beam Java currently depends on grpc 1.2, which was released in March. It > would be great if the dependency could be update to something newer, like > grpc 1.7.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3412) Update BigTable client version to 1.0
[ https://issues.apache.org/jira/browse/BEAM-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334635#comment-16334635 ] Solomon Duskis commented on BEAM-3412: -- I submitted [https://github.com/apache/beam/pull/4462]. Basically, I used a for loop + _addMutations()_ instead of _addAllMutations()_. I also added mock tests for _BigtableServiceImpl_ so that future upgrades won't cause problems like this one. > Update BigTable client version to 1.0 > - > > Key: BEAM-3412 > URL: https://issues.apache.org/jira/browse/BEAM-3412 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Chamikara Jayalath >Assignee: Solomon Duskis >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3412) Update BigTable client version to 1.0
[ https://issues.apache.org/jira/browse/BEAM-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331108#comment-16331108 ] Solomon Duskis commented on BEAM-3412: -- [~chamikara]: we cannot use [bigtable-hbase-1.x-shaded|https://mvnrepository.com/artifact/com.google.cloud.bigtable/bigtable-hbase-1.x-shaded/1.0.0]. That artifact is required for our CloudBigtableIO implementation to coexist with BigtableIO. For this specific issue, we might actually have a simple work around. Long term, we need to consider the following: * Beam should upgrade grpc / protobuf versions. Yes, it's difficult. However, having dependencies that are years out of date cause other issues. * CloudBigtableIO should be replaced with a new implementation that lives in the Beam repository. > Update BigTable client version to 1.0 > - > > Key: BEAM-3412 > URL: https://issues.apache.org/jira/browse/BEAM-3412 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Chamikara Jayalath >Assignee: Solomon Duskis >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289655#comment-16289655 ] Solomon Duskis commented on BEAM-3342: -- I started with a simple pipeline that writes to Cloud Bigtable via the google.cloud bigtable package, which works locally with google.cloud installed, but doesn't work when I use a dataflow runner. Here's what I get: == message: "Not processing workitem 2633526545277283048 since a deferred exception was found: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 706, in run self._load_main_session(self.local_staging_directory) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 446, in _load_main_session pickler.load_session(session_file) File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 247, in load_session return dill.load_session(file_path) File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in load_session module = unpickler.load() File "/usr/lib/python2.7/pickle.py", line 858, in load dispatch[key](self) File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce value = func(*args) File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module return getattr(__import__(module, None, None, [obj]), obj) AttributeError: 'module' object has no attribute 'bigtable' == Can I use the standard google.cloud bigtable client? If so, how, and why don't BigQuery and Storage use the google.cloud clients? > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Ahmet Altay > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (BEAM-3342) Create a Cloud Bigtable Python connector
Solomon Duskis created BEAM-3342: Summary: Create a Cloud Bigtable Python connector Key: BEAM-3342 URL: https://issues.apache.org/jira/browse/BEAM-3342 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Solomon Duskis Assignee: Ahmet Altay I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3311) Extend BigTableIO to write Iterable of KV
[ https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283044#comment-16283044 ] Solomon Duskis commented on BEAM-3311: -- I definitely agree that larger bundles are important. I would need help from the Beam team at large to figure out a general solution to this problem. Here are some useful examples of controlling bundling using Beam constructs that I got from the Dataflow team that you can use to create a solution that would work in your specific case: * Here is an example of how to use a stateful DoFn to buffer and pushback data here: https://beam.apache.org/blog/2017/08/28/timely-processing.html. Using a stateful DoFn will allow you to control exactly when data is output to BigtableIO but is more complicated to write and get correct. * Alternatively, you can add a set of steps which will buffer data using a trigger. PubSubIO -> ... original pipeline ... -> ParDo(Choose a random key in [0, 1000)) -> Window.into(new GlobalWindows()).triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(10 -> GBK -> Values -> BigtableIO The logic behind the above pipeline is that your regrouping all your data into a fixed key space [0, 1000) in the global window and then attempting to write to BigtableIO every 10 seconds. This will cause you to get average bundles of: data output from original pipeline in 10 seconds / 1000 keys. Good thing is that the transform needed to write is easy and you push all the buffering logic to the system instead of owning it. Bad thing is that your rewindowing which may not work depending on whether your writing windowing information to BigTable. > Extend BigTableIO to write Iterable of KV > -- > > Key: BEAM-3311 > URL: https://issues.apache.org/jira/browse/BEAM-3311 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Affects Versions: 2.2.0 >Reporter: Anna Smith >Assignee: Solomon Duskis > > The motivation is to achieve qps as advertised in BigTable in Dataflow > streaming mode (ex: 300k qps for 30 node cluster). Currently we aren't > seeing this as the bundle size is small in streaming mode and the requests > are overwhelmed by AuthentiationHeader. For example, in order to achieve qps > advertised each payload is recommended to be ~1KB but without batching each > payload is 7KB, the majority of which is the authentication header. > Currently BigTableIO supports DoFn>,...> > where batching is done per Bundle on flush in finishBundle. We would like to > be able to manually batch using a DoFn Iterable>>,...> so we can get around the small Bundle size in > streaming. We have seen some improvements in qps to BigTable when running > with Dataflow using this approach. > Initial thoughts on implementation would be to extend Write in order to have > a BulkWrite of Iterable>>. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3098) Upgrade Java grpc version
[ https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282659#comment-16282659 ] Solomon Duskis commented on BEAM-3098: -- grpc 1.7.0+ allows for tcnative to be shaded. Cloud Bigtable does that now with our CloudBigtableIO client, but ideally, we should use the same grpc version as everyone else. Which client libraries other than Cloud Bigtable shade away gRCP et al? > Upgrade Java grpc version > - > > Key: BEAM-3098 > URL: https://issues.apache.org/jira/browse/BEAM-3098 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Solomon Duskis > > Beam Java currently depends on grpc 1.2, which was released in March. It > would be great if the dependency could be update to something newer, like > grpc 1.7.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (BEAM-3311) Extend BigTableIO to write Iterable of KV
[ https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Solomon Duskis reassigned BEAM-3311: Assignee: Solomon Duskis (was: Chamikara Jayalath) > Extend BigTableIO to write Iterable of KV > -- > > Key: BEAM-3311 > URL: https://issues.apache.org/jira/browse/BEAM-3311 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Affects Versions: 2.2.0 >Reporter: Anna Smith >Assignee: Solomon Duskis > > The motivation is to achieve qps as advertised in BigTable in Dataflow > streaming mode (ex: 300k qps for 30 node cluster). Currently we aren't > seeing this as the bundle size is small in streaming mode and the requests > are overwhelmed by AuthentiationHeader. For example, in order to achieve qps > advertised each payload is recommended to be ~1KB but without batching each > payload is 7KB, the majority of which is the authentication header. > Currently BigTableIO supports DoFn>,...> > where batching is done per Bundle on flush in finishBundle. We would like to > be able to manually batch using a DoFn Iterable>>,...> so we can get around the small Bundle size in > streaming. We have seen some improvements in qps to BigTable when running > with Dataflow using this approach. > Initial thoughts on implementation would be to extend Write in order to have > a BulkWrite of Iterable>>. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3154) Support multiple KeyRanges when reading from BigTable
[ https://issues.apache.org/jira/browse/BEAM-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267732#comment-16267732 ] Solomon Duskis commented on BEAM-3154: -- This is non trivial. We probably won't get to it this year. > Support multiple KeyRanges when reading from BigTable > - > > Key: BEAM-3154 > URL: https://issues.apache.org/jira/browse/BEAM-3154 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Ryan Niemocienski >Assignee: Solomon Duskis >Priority: Minor > > BigTableIO.Read currently only supports reading one KeyRange from BT. It > would be nice to read multiple ranges from BigTable in one read. Thoughts on > the feasibility of this before I dig into it? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (BEAM-3246) BigtableIO should merge splits if they exceed 15K
Solomon Duskis created BEAM-3246: Summary: BigtableIO should merge splits if they exceed 15K Key: BEAM-3246 URL: https://issues.apache.org/jira/browse/BEAM-3246 Project: Beam Issue Type: Bug Components: sdk-java-gcp Reporter: Solomon Duskis Assignee: Solomon Duskis A customer hit a problem with a large number of splits. CloudBitableIO fixes that here https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-hbase-beam/src/main/java/com/google/cloud/bigtable/beam/CloudBigtableIO.java#L241 BigtableIO should have similar logic. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector
[ https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219783#comment-16219783 ] Solomon Duskis commented on BEAM-2955: -- The problem is that Cloud Bigtable needs the following things: # A different method for splitting. # A different configuration mechanism for Cloud Bigtable specific configuration. The configuration mechanism would also require the use of ValueProvider for templating purposes. # A custom Cloud Bigtable oriented metric for expressing throttling. # A custom way to use MultiRowRangeFilter (which is different between Cloud Bigtable and HBase) There are probably other differences I'm missing. A Service works for issue #1, but not for the rest. There definitely is room for reuse, but I'm not sure if passing a Service to HBaseIO is the right way to do it. > Create a Cloud Bigtable HBase connector > --- > > Key: BEAM-2955 > URL: https://issues.apache.org/jira/browse/BEAM-2955 > Project: Beam > Issue Type: New Feature > Components: sdk-java-gcp >Reporter: Solomon Duskis >Assignee: Solomon Duskis > > The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a > different repo for awhile. Recently, we did some reworking of the Cloud > Bigtable client that would allow it to better coexist in the Beam ecosystem, > and we also released a Beam connector in our repository that exposes HBase > idioms rather than the Protobuf idioms of BigtableIO. More information about > the customer experience of the HBase connector can be found here: > [https://cloud.google.com/bigtable/docs/dataflow-hbase]. > The Beam repo is a much better place to house a Cloud Bigtable HBase > connector. There are a couple of ways we can implement this new connector: > # The CBT connector depends on artifacts in the io/hbase maven project. We > can create a new extend HBaseIO for the purposes of CBT. We would have to > add some features to HBaseIO to make that work (dynamic rebalancing, and a > way for HBase and CBT's size estimation models to coexist) > # The BigtableIO connector works well, and we can add an adapter layer on top > of it. I have a proof of concept of it here: > [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam]. > # We can build a separate CBT HBase connector. > I'm happy to do the work. I would appreciate some guidance and discussion > about the right approach. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (BEAM-3098) Upgrade Java grpc version
Solomon Duskis created BEAM-3098: Summary: Upgrade Java grpc version Key: BEAM-3098 URL: https://issues.apache.org/jira/browse/BEAM-3098 Project: Beam Issue Type: Improvement Components: sdk-java-core Reporter: Solomon Duskis Assignee: Kenneth Knowles Beam Java currently depends on grpc 1.2, which was released in March. It would be great if the dependency could be update to something newer, like grpc 1.7.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3008) BigtableIO should use ValueProviders
[ https://issues.apache.org/jira/browse/BEAM-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188971#comment-16188971 ] Solomon Duskis commented on BEAM-3008: -- Cloud Bigtable is a constrained problem. Writes needs a Cloud project id, instance id and table name. Reads also need a scan. HBaseIO currently takes in a Configuration object. If there's a small set of HBase configuration key/value pairs then it's absolutely makes sense to have HBase specific configuration options. HBaseIO and CloudBigtable need different configuration options. I think that we can create an AbstractHBaseIO that defers Connection creation to a child which would have the more specific configuration options. > BigtableIO should use ValueProviders > - > > Key: BEAM-3008 > URL: https://issues.apache.org/jira/browse/BEAM-3008 > Project: Beam > Issue Type: New Feature > Components: sdk-java-gcp >Reporter: Solomon Duskis >Assignee: Solomon Duskis > > [https://github.com/apache/beam/pull/2057] is an effort towards BigtableIO > templatization. This Issue is a request to get a fully featured template for > BigtableIO. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (BEAM-3008) BigtableIO should use ValueProviders
Solomon Duskis created BEAM-3008: Summary: BigtableIO should use ValueProviders Key: BEAM-3008 URL: https://issues.apache.org/jira/browse/BEAM-3008 Project: Beam Issue Type: New Feature Components: sdk-java-gcp Reporter: Solomon Duskis Assignee: Solomon Duskis [https://github.com/apache/beam/pull/2057] is an effort towards BigtableIO templatization. This Issue is a request to get a fully featured template for BigtableIO. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector
[ https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165523#comment-16165523 ] Solomon Duskis commented on BEAM-2955: -- Chamikra: HBaseIO will have to be extended or wrapped. Cloud Bigtable needs slightly different configuration options, has a different way to calculate estimated sizes, and needs templating. The interface would essentially be the same whether we leverage HBaseIO or BigtableIO. The BigtableIO wrapper that I wrote was 271 lines of code. I'll create a PR for the BigtableIO wrapper in the Beam github project, since the code is already written. I'll also create a PR for an extension of HBaseIO. That way, we can compare the two options. > Create a Cloud Bigtable HBase connector > --- > > Key: BEAM-2955 > URL: https://issues.apache.org/jira/browse/BEAM-2955 > Project: Beam > Issue Type: New Feature > Components: sdk-java-gcp >Reporter: Solomon Duskis >Assignee: Solomon Duskis > > The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a > different repo for awhile. Recently, we did some reworking of the Cloud > Bigtable client that would allow it to better coexist in the Beam ecosystem, > and we also released a Beam connector in our repository that exposes HBase > idioms rather than the Protobuf idioms of BigtableIO. More information about > the customer experience of the HBase connector can be found here: > [https://cloud.google.com/bigtable/docs/dataflow-hbase]. > The Beam repo is a much better place to house a Cloud Bigtable HBase > connector. There are a couple of ways we can implement this new connector: > # The CBT connector depends on artifacts in the io/hbase maven project. We > can create a new extend HBaseIO for the purposes of CBT. We would have to > add some features to HBaseIO to make that work (dynamic rebalancing, and a > way for HBase and CBT's size estimation models to coexist) > # The BigtableIO connector works well, and we can add an adapter layer on top > of it. I have a proof of concept of it here: > [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam]. > # We can build a separate CBT HBase connector. > I'm happy to do the work. I would appreciate some guidance and discussion > about the right approach. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector
[ https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165431#comment-16165431 ] Solomon Duskis commented on BEAM-2955: -- It's awesome that you added the Dynamic rebalancing! I'm ok with extending HBaseIO, as long as there aren't any other overriding concerns. I'd like to explore the possibility of templates (ValueProviders) as the configuration of HBaseIO. > Create a Cloud Bigtable HBase connector > --- > > Key: BEAM-2955 > URL: https://issues.apache.org/jira/browse/BEAM-2955 > Project: Beam > Issue Type: New Feature > Components: sdk-java-gcp >Reporter: Solomon Duskis >Assignee: Solomon Duskis > > The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a > different repo for awhile. Recently, we did some reworking of the Cloud > Bigtable client that would allow it to better coexist in the Beam ecosystem, > and we also released a Beam connector in our repository that exposes HBase > idioms rather than the Protobuf idioms of BigtableIO. More information about > the customer experience of the HBase connector can be found here: > [https://cloud.google.com/bigtable/docs/dataflow-hbase]. > The Beam repo is a much better place to house a Cloud Bigtable HBase > connector. There are a couple of ways we can implement this new connector: > # The CBT connector depends on artifacts in the io/hbase maven project. We > can create a new extend HBaseIO for the purposes of CBT. We would have to > add some features to HBaseIO to make that work (dynamic rebalancing, and a > way for HBase and CBT's size estimation models to coexist) > # The BigtableIO connector works well, and we can add an adapter layer on top > of it. I have a proof of concept of it here: > [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam]. > # We can build a separate CBT HBase connector. > I'm happy to do the work. I would appreciate some guidance and discussion > about the right approach. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (BEAM-2955) Create a Cloud Bigtable HBase connector
Solomon Duskis created BEAM-2955: Summary: Create a Cloud Bigtable HBase connector Key: BEAM-2955 URL: https://issues.apache.org/jira/browse/BEAM-2955 Project: Beam Issue Type: New Feature Components: sdk-java-gcp Reporter: Solomon Duskis Assignee: Chamikara Jayalath The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a different repo for awhile. Recently, we did some reworking of the Cloud Bigtable client that would allow it to better coexist in the Beam ecosystem, and we also released a Beam connector in our repository that exposes HBase idioms rather than the Protobuf idioms of BigtableIO. More information about the customer experience of the HBase connector can be found here: [https://cloud.google.com/bigtable/docs/dataflow-hbase]. The Beam repo is a much better place to house a Cloud Bigtable HBase connector. There are a couple of ways we can implement this new connector: # The CBT connector depends on artifacts in the io/hbase maven project. We can create a new extend HBaseIO for the purposes of CBT. We would have to add some features to HBaseIO to make that work (dynamic rebalancing, and a way for HBase and CBT's size estimation models to coexist) # The BigtableIO connector works well, and we can add an adapter layer on top of it. I have a proof of concept of it here: [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam]. # We can build a separate CBT HBase connector. I'm happy to do the work. I would appreciate some guidance and discussion about the right approach. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row
[ https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157507#comment-16157507 ] Solomon Duskis commented on BEAM-2545: -- FYI, the Cloud Bigtable team released a version of CloudBigtableIO that works for Beam. I need to open a new issue here to discuss the possibility of creating a Cloud Bigtable beam connector that uses HBase objects. There are a few ways we can go with that, and a few potential pitfalls that ought to be discussed. > bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row > > > Key: BEAM-2545 > URL: https://issues.apache.org/jira/browse/BEAM-2545 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Reporter: Stephen Sisk >Assignee: Chamikara Jayalath > > The BigtableWriteIT is taking a long time (~10min) and throwing errors. > Example test run: > https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/ > (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing > to Bigtable. First 10 errors: > Error mutating row key00175 with mutations [set_cell { > family_name: "cf" > value: "value00175" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00176 with mutations [set_cell { > family_name: "cf" > value: "value00176" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00177 with mutations [set_cell { > family_name: "cf" > value: "value00177" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00178 with mutations [set_cell { > family_name: "cf" > value: "value00178" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00179 with mutations [set_cell { > family_name: "cf" > value: "value00179" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00180 with mutations [set_cell { > family_name: "cf" > value: "value00180" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00181 with mutations [set_cell { > family_name: "cf" > value: "value00181" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00182 with mutations [set_cell { > family_name: "cf" > value: "value00182" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00183 with mutations [set_cell { > family_name: "cf" > value: "value00183" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00184 with mutations [set_cell { > family_name: "cf" > value: "value00184" > } > ]: UNKNOWN: Stale requests. > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655) > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607) > Stacktrace > java.lang.RuntimeException: > (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing > to Bigtable. First 10 errors: > Error mutating row key00175 with mutations [set_cell { > family_name: "cf" > value: "value00175" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00176 with mutations [set_cell { > family_name: "cf" > value: "value00176" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00177 with mutations [set_cell { > family_name: "cf" > value: "value00177" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00178 with mutations [set_cell { > family_name: "cf" > value: "value00178" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00179 with mutations [set_cell { > family_name: "cf" > value: "value00179" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00180 with mutations [set_cell { > family_name: "cf" > value: "value00180" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00181 with mutations [set_cell { > family_name: "cf" > value: "value00181" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00182 with mutations [set_cell { > family_name: "cf" > value: "value00182" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00183 with mutations [set_cell { > family_name: "cf" > value: "value00183" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00184 with mutations [set_cell { > family_name: "cf" > value: "value00184" > } > ]: UNKNOWN: Stale requests. > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655) > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607) > at > org.apache.beam.run
[jira] [Comment Edited] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row
[ https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157488#comment-16157488 ] Solomon Duskis edited comment on BEAM-2545 at 9/7/17 7:32 PM: -- Yes. 1.0.0-pre3 is the proper version to choose. It should fix that problem. Users should be able to explicitly add com.google.cloud.bigtable:bigtable-client-core:1.0.0-pre3 to their maven/gradle configurations to fix this problem with Beam 2.1.0. was (Author: sduskis): Yes. 1.0.0-pre3 is the proper version to choose. It should fix that problem. Users should be able to explicitly add com.google.cloud.bigtabl:bigtable-client-core:1.0.0-pre3 to their maven/gradle configurations to fix this problem with Beam 2.1.0. > bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row > > > Key: BEAM-2545 > URL: https://issues.apache.org/jira/browse/BEAM-2545 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Reporter: Stephen Sisk >Assignee: Chamikara Jayalath > > The BigtableWriteIT is taking a long time (~10min) and throwing errors. > Example test run: > https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/ > (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing > to Bigtable. First 10 errors: > Error mutating row key00175 with mutations [set_cell { > family_name: "cf" > value: "value00175" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00176 with mutations [set_cell { > family_name: "cf" > value: "value00176" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00177 with mutations [set_cell { > family_name: "cf" > value: "value00177" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00178 with mutations [set_cell { > family_name: "cf" > value: "value00178" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00179 with mutations [set_cell { > family_name: "cf" > value: "value00179" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00180 with mutations [set_cell { > family_name: "cf" > value: "value00180" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00181 with mutations [set_cell { > family_name: "cf" > value: "value00181" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00182 with mutations [set_cell { > family_name: "cf" > value: "value00182" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00183 with mutations [set_cell { > family_name: "cf" > value: "value00183" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00184 with mutations [set_cell { > family_name: "cf" > value: "value00184" > } > ]: UNKNOWN: Stale requests. > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655) > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607) > Stacktrace > java.lang.RuntimeException: > (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing > to Bigtable. First 10 errors: > Error mutating row key00175 with mutations [set_cell { > family_name: "cf" > value: "value00175" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00176 with mutations [set_cell { > family_name: "cf" > value: "value00176" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00177 with mutations [set_cell { > family_name: "cf" > value: "value00177" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00178 with mutations [set_cell { > family_name: "cf" > value: "value00178" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00179 with mutations [set_cell { > family_name: "cf" > value: "value00179" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00180 with mutations [set_cell { > family_name: "cf" > value: "value00180" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00181 with mutations [set_cell { > family_name: "cf" > value: "value00181" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00182 with mutations [set_cell { > family_name: "cf" > value: "value00182" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00183 with mutations [set_cell { > family_name: "cf" > value: "value00183" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00184 with mutations [set_cell { > family_name: "cf" > value: "value00184" > } > ]: UNKNOWN: Stale requests. > at > org
[jira] [Commented] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row
[ https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157488#comment-16157488 ] Solomon Duskis commented on BEAM-2545: -- Yes. 1.0.0-pre3 is the proper version to choose. It should fix that problem. Users should be able to explicitly add com.google.cloud.bigtabl:bigtable-client-core:1.0.0-pre3 to their maven/gradle configurations to fix this problem with Beam 2.1.0. > bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row > > > Key: BEAM-2545 > URL: https://issues.apache.org/jira/browse/BEAM-2545 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Reporter: Stephen Sisk >Assignee: Chamikara Jayalath > > The BigtableWriteIT is taking a long time (~10min) and throwing errors. > Example test run: > https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/ > (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing > to Bigtable. First 10 errors: > Error mutating row key00175 with mutations [set_cell { > family_name: "cf" > value: "value00175" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00176 with mutations [set_cell { > family_name: "cf" > value: "value00176" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00177 with mutations [set_cell { > family_name: "cf" > value: "value00177" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00178 with mutations [set_cell { > family_name: "cf" > value: "value00178" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00179 with mutations [set_cell { > family_name: "cf" > value: "value00179" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00180 with mutations [set_cell { > family_name: "cf" > value: "value00180" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00181 with mutations [set_cell { > family_name: "cf" > value: "value00181" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00182 with mutations [set_cell { > family_name: "cf" > value: "value00182" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00183 with mutations [set_cell { > family_name: "cf" > value: "value00183" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00184 with mutations [set_cell { > family_name: "cf" > value: "value00184" > } > ]: UNKNOWN: Stale requests. > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655) > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607) > Stacktrace > java.lang.RuntimeException: > (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing > to Bigtable. First 10 errors: > Error mutating row key00175 with mutations [set_cell { > family_name: "cf" > value: "value00175" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00176 with mutations [set_cell { > family_name: "cf" > value: "value00176" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00177 with mutations [set_cell { > family_name: "cf" > value: "value00177" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00178 with mutations [set_cell { > family_name: "cf" > value: "value00178" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00179 with mutations [set_cell { > family_name: "cf" > value: "value00179" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00180 with mutations [set_cell { > family_name: "cf" > value: "value00180" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00181 with mutations [set_cell { > family_name: "cf" > value: "value00181" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00182 with mutations [set_cell { > family_name: "cf" > value: "value00182" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00183 with mutations [set_cell { > family_name: "cf" > value: "value00183" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00184 with mutations [set_cell { > family_name: "cf" > value: "value00184" > } > ]: UNKNOWN: Stale requests. > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655) > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607) > at > org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:133) >
[jira] [Commented] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row
[ https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069316#comment-16069316 ] Solomon Duskis commented on BEAM-2545: -- I would suggest upgrading to the 1.0.0-pre1 release. We did a complete overhaul of BulkMutation between 0.9.7.1 and 1.0.0-pre1. We didn't see "stale requests" in our tests of 0.9.7.1, but we saw some stuckness under heavy load. 1.0.0-pre1 didn't exhibit any of the problems we saw in earlier versions. > bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row > > > Key: BEAM-2545 > URL: https://issues.apache.org/jira/browse/BEAM-2545 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Reporter: Stephen Sisk >Assignee: Stephen Sisk > Fix For: 2.1.0 > > > The BigtableWriteIT is taking a long time (~10min) and throwing errors. > Example test run: > https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/ > (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing > to Bigtable. First 10 errors: > Error mutating row key00175 with mutations [set_cell { > family_name: "cf" > value: "value00175" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00176 with mutations [set_cell { > family_name: "cf" > value: "value00176" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00177 with mutations [set_cell { > family_name: "cf" > value: "value00177" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00178 with mutations [set_cell { > family_name: "cf" > value: "value00178" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00179 with mutations [set_cell { > family_name: "cf" > value: "value00179" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00180 with mutations [set_cell { > family_name: "cf" > value: "value00180" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00181 with mutations [set_cell { > family_name: "cf" > value: "value00181" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00182 with mutations [set_cell { > family_name: "cf" > value: "value00182" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00183 with mutations [set_cell { > family_name: "cf" > value: "value00183" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00184 with mutations [set_cell { > family_name: "cf" > value: "value00184" > } > ]: UNKNOWN: Stale requests. > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655) > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607) > Stacktrace > java.lang.RuntimeException: > (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing > to Bigtable. First 10 errors: > Error mutating row key00175 with mutations [set_cell { > family_name: "cf" > value: "value00175" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00176 with mutations [set_cell { > family_name: "cf" > value: "value00176" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00177 with mutations [set_cell { > family_name: "cf" > value: "value00177" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00178 with mutations [set_cell { > family_name: "cf" > value: "value00178" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00179 with mutations [set_cell { > family_name: "cf" > value: "value00179" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00180 with mutations [set_cell { > family_name: "cf" > value: "value00180" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00181 with mutations [set_cell { > family_name: "cf" > value: "value00181" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00182 with mutations [set_cell { > family_name: "cf" > value: "value00182" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00183 with mutations [set_cell { > family_name: "cf" > value: "value00183" > } > ]: UNKNOWN: Stale requests. > Error mutating row key00184 with mutations [set_cell { > family_name: "cf" > value: "value00184" > } > ]: UNKNOWN: Stale requests. > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655) > at > org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607) > at > org.ap
[jira] [Commented] (BEAM-2395) BigtableIO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056286#comment-16056286 ] Solomon Duskis commented on BEAM-2395: -- The python cloud bigtable client is missing some key features that the java client has relating to robustness and performance. Specifically, the Java client has a "smart retries" feature that allows writes and reads to proceed despite temporary error conditions. The python client also needs use the "bulk write" API for performance purposes. Without those features, a python Cloud Bigtable connector should not be considered ready for production. FWIW, there are ongoing efforts to add those features. > BigtableIO for Python SDK > - > > Key: BEAM-2395 > URL: https://issues.apache.org/jira/browse/BEAM-2395 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Matthias Baetens >Assignee: Matthias Baetens > Labels: features > > Developing a read and write IO for BigTable for the Python SDK. > Working / design document can be found here: > https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (BEAM-2181) Upgrade Bigtable dependency to 0.9.6.2
Solomon Duskis created BEAM-2181: Summary: Upgrade Bigtable dependency to 0.9.6.2 Key: BEAM-2181 URL: https://issues.apache.org/jira/browse/BEAM-2181 Project: Beam Issue Type: Bug Components: sdk-java-gcp Reporter: Solomon Duskis Assignee: Daniel Halperin Cloud Bigtable 0.9.6.2 has some fixes relating to: 1) Using dependencies for GCP protobuf objects rather than including generated artifacts directly in bigtable-protos 2) BulkMutation bug fixes 3) Auth token management 4) Using fewer grpc experimental features. All are important in the context of beam, so the beam dependency should be upgraded. One snag came up. BigtableSession.isAlpnProviderEnabled() was removed in order to reduce the number of grpc experimental features. BigtableServiceImpl.tableExists() can no longer depend on isAlpnProviderEnabled(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1269) BigtableIO should make more efficient use of connections
[ https://issues.apache.org/jira/browse/BEAM-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948013#comment-15948013 ] Solomon Duskis commented on BEAM-1269: -- BigtableIO should not set data channel pool counts for reads. This is the current line: // Set data channel count to one because there is only 1 scanner in this session BigtableOptions.Builder clonedBuilder = options.toBuilder() .setDataChannelCount(1); BigtableOptions optionsWithAgent = clonedBuilder.setUserAgent(getBeamSdkPartOfUserAgent()).build(); It should be more like: BigtableOptions optionsWithAgent = options .toBuilder() .setUserAgent(getBeamSdkPartOfUserAgent()) . setUseCachedDataPool(true) . setDataHost(BigtableOptions.BIGTABLE_BATCH_DATA_HOST_DEFAULT) .build(); > BigtableIO should make more efficient use of connections > > > Key: BEAM-1269 > URL: https://issues.apache.org/jira/browse/BEAM-1269 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Daniel Halperin > Labels: newbie, starter > > RIght now, {{BigtableIO}} opens up a new Bigtable session for every DoFn, in > the {{@Setup}} function. However, sessions can support multiple connections, > so perhaps this code should be modified to open up a smaller session pool and > then allocation connections in {{@StartBundle}}. > This would likely make more efficient use of resources, especially for highly > multithreaded workers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1269) BigtableIO should make more efficient use of connections
[ https://issues.apache.org/jira/browse/BEAM-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947642#comment-15947642 ] Solomon Duskis commented on BEAM-1269: -- Cloud Bigtable client 0.9.6 was just released, and should be flowing through the maven repo process now. This feature can be invoked via BigtableOptions.setUseCachedDataPool(true) I have a follow up request to also set BigtableOptions.setDataHost(BigtableOptions.BIGTABLE_BATCH_DATA_HOST_DEFAULT) which will be a host dedicated to Batch type workloads like Dataflow. > BigtableIO should make more efficient use of connections > > > Key: BEAM-1269 > URL: https://issues.apache.org/jira/browse/BEAM-1269 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Daniel Halperin > > RIght now, {{BigtableIO}} opens up a new Bigtable session for every DoFn, in > the {{@Setup}} function. However, sessions can support multiple connections, > so perhaps this code should be modified to open up a smaller session pool and > then allocation connections in {{@StartBundle}}. > This would likely make more efficient use of resources, especially for highly > multithreaded workers. -- This message was sent by Atlassian JIRA (v6.3.15#6346)