[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2018-08-21 Thread Solomon Duskis (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587393#comment-16587393
 ] 

Solomon Duskis commented on BEAM-3342:
--

The Cloud Bigtable client is just about ready with full functionality.  It did 
indeed take longer that we were expected.  Once we do that, there's a 
likelihood that a Python write connector will significantly underperform 
compared to Java, since the Python client only performs synchronous operations, 
where the Java has a high throughput asynchronous writer.

Also, in terms of reading from Cloud Bigtable, any connector needs full support 
for a BoundedSource, or something like it.  We could not figure out how to make 
BoundedSource work in Python.

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3246) BigtableIO should merge splits if they exceed 15K

2018-06-25 Thread Solomon Duskis (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solomon Duskis resolved BEAM-3246.
--
   Resolution: Fixed
Fix Version/s: 2.5.0

This issue was fixed with [this 
commit|https://github.com/apache/beam/commit/7dbcb11ff1cb9f2b5f0ffdb63bb38b686fdb0c71].

> BigtableIO should merge splits if they exceed 15K
> -
>
> Key: BEAM-3246
> URL: https://issues.apache.org/jira/browse/BEAM-3246
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> A customer hit a problem with a large number of splits.  CloudBitableIO fixes 
> that here 
> https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-hbase-beam/src/main/java/com/google/cloud/bigtable/beam/CloudBigtableIO.java#L241
> BigtableIO should have similar logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4564) Update Bigtable dependencies

2018-06-14 Thread Solomon Duskis (JIRA)
Solomon Duskis created BEAM-4564:


 Summary: Update Bigtable dependencies
 Key: BEAM-4564
 URL: https://issues.apache.org/jira/browse/BEAM-4564
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Affects Versions: 2.5.0
Reporter: Solomon Duskis
Assignee: Solomon Duskis


Cloud Bigtable's dependencies should be updated:

Here are the current versions:
 * bigtable.version: 1.0.0
 * bigtable.proto.version: 1.0.0-pre3

The new bigtable.version is 1.4.0.
The Bigtable protos dependency needs to change to the 0.15.0 version of 
com.google.api.grpc:proto-google-cloud-bigtable-v2 and 
com.google.api.grpc:proto-google-cloud-bigtable-admin-v2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector

2018-04-30 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458848#comment-16458848
 ] 

Solomon Duskis commented on BEAM-2955:
--

[~iemejia], what's going on with HBaseIO these days?  Is it safe to start work 
on this?

> Create a Cloud Bigtable HBase connector
> ---
>
> Key: BEAM-2955
> URL: https://issues.apache.org/jira/browse/BEAM-2955
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>
> The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a 
> different repo for awhile. Recently, we did some reworking of the Cloud 
> Bigtable client that would allow it to better coexist in the Beam ecosystem, 
> and we also released a Beam connector in our repository that exposes HBase 
> idioms rather than the Protobuf idioms of BigtableIO.  More information about 
> the customer experience of the HBase connector can be found here: 
> [https://cloud.google.com/bigtable/docs/dataflow-hbase].
> The Beam repo is a much better place to house a Cloud Bigtable HBase 
> connector.  There are a couple of ways we can implement this new connector:
> # The CBT connector depends on artifacts in the io/hbase maven project.  We 
> can create a new extend HBaseIO for the purposes of CBT.  We would have to 
> add some features to HBaseIO to make that work (dynamic rebalancing, and a 
> way for HBase and CBT's size estimation models to coexist)
> # The BigtableIO connector works well, and we can add an adapter layer on top 
> of it.  I have a proof of concept of it here: 
> [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam].
> # We can build a separate CBT HBase connector.
> I'm happy to do the work.  I would appreciate some guidance and discussion 
> about the right approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3311) Extend BigTableIO to write Iterable of KV

2018-01-25 Thread Solomon Duskis (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solomon Duskis closed BEAM-3311.

   Resolution: Won't Fix
Fix Version/s: Not applicable

Use Flatten.iterable() instead of duplicating that functionality in BigtableIO.

> Extend BigTableIO to write Iterable of KV 
> --
>
> Key: BEAM-3311
> URL: https://issues.apache.org/jira/browse/BEAM-3311
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Affects Versions: 2.2.0
>Reporter: Anna Smith
>Assignee: Solomon Duskis
>Priority: Major
> Fix For: Not applicable
>
>
> The motivation is to achieve qps as advertised in BigTable in Dataflow 
> streaming mode (ex: 300k qps for 30 node cluster).  Currently we aren't 
> seeing this as the bundle size is small in streaming mode and the requests 
> are overwhelmed by AuthentiationHeader.  For example, in order to achieve qps 
> advertised each payload is recommended to be ~1KB but without batching each 
> payload is 7KB, the majority of which is the authentication header.
> Currently BigTableIO supports DoFn>,...> 
> where batching is done per Bundle on flush in finishBundle. We would like to 
> be able to manually batch using a DoFn Iterable>>,...> so we can get around the small Bundle size in 
> streaming.  We have seen some improvements in qps to BigTable when running 
> with Dataflow using this approach.
> Initial thoughts on implementation would be to extend Write in order to have 
> a BulkWrite of Iterable>>.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3311) Extend BigTableIO to write Iterable of KV

2018-01-25 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340124#comment-16340124
 ] 

Solomon Duskis commented on BEAM-3311:
--

I spoke quite a bit with the Beam team about this.  BigtableIO should remain as 
is.  It looks like there's a _Flatten.iterables()_ which ought to convert an 
_Iterable_ to a _T_.  The BigtableIO connector is meant to satisfy 80%+ of 
the use cases.  In other cases, I generally look for common usage patterns 
before a change is made to any connector.  

In addition to this approach, you can also create your own DoFn that does 
arbitrary operations against a 
[BigtableSession|https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-client-core-parent/bigtable-client-core/src/main/java/com/google/cloud/bigtable/grpc/BigtableSession.java].
  Be sure to use _BigtableOptions.Builder.setUseCachedDataPool(true)_, if you 
chose to go down this route.

> Extend BigTableIO to write Iterable of KV 
> --
>
> Key: BEAM-3311
> URL: https://issues.apache.org/jira/browse/BEAM-3311
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Affects Versions: 2.2.0
>Reporter: Anna Smith
>Assignee: Solomon Duskis
>Priority: Major
>
> The motivation is to achieve qps as advertised in BigTable in Dataflow 
> streaming mode (ex: 300k qps for 30 node cluster).  Currently we aren't 
> seeing this as the bundle size is small in streaming mode and the requests 
> are overwhelmed by AuthentiationHeader.  For example, in order to achieve qps 
> advertised each payload is recommended to be ~1KB but without batching each 
> payload is 7KB, the majority of which is the authentication header.
> Currently BigTableIO supports DoFn>,...> 
> where batching is done per Bundle on flush in finishBundle. We would like to 
> be able to manually batch using a DoFn Iterable>>,...> so we can get around the small Bundle size in 
> streaming.  We have seen some improvements in qps to BigTable when running 
> with Dataflow using this approach.
> Initial thoughts on implementation would be to extend Write in order to have 
> a BulkWrite of Iterable>>.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2018-01-23 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336429#comment-16336429
 ] 

Solomon Duskis commented on BEAM-3342:
--

It turns out that we have quite a bit of work to do on the core Cloud Bigtable 
python client in order to make an effective Beam connector.  It could be a 
while before the client is ready.  

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3098) Upgrade Java grpc version

2018-01-22 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335013#comment-16335013
 ] 

Solomon Duskis commented on BEAM-3098:
--

Ping.

> Upgrade Java grpc version
> -
>
> Key: BEAM-3098
> URL: https://issues.apache.org/jira/browse/BEAM-3098
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Solomon Duskis
>Priority: Major
>
> Beam Java currently depends on grpc 1.2, which was released in March.  It 
> would be great if the dependency could be update to something newer, like 
> grpc 1.7.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3412) Update BigTable client version to 1.0

2018-01-22 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334635#comment-16334635
 ] 

Solomon Duskis commented on BEAM-3412:
--

I submitted [https://github.com/apache/beam/pull/4462].  Basically, I used a 
for loop + _addMutations()_ instead of _addAllMutations()_.  I also added mock 
tests for _BigtableServiceImpl_ so that future upgrades won't cause problems 
like this one.

> Update BigTable client version to 1.0
> -
>
> Key: BEAM-3412
> URL: https://issues.apache.org/jira/browse/BEAM-3412
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Chamikara Jayalath
>Assignee: Solomon Duskis
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3412) Update BigTable client version to 1.0

2018-01-18 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331108#comment-16331108
 ] 

Solomon Duskis commented on BEAM-3412:
--

[~chamikara]: we cannot use 
[bigtable-hbase-1.x-shaded|https://mvnrepository.com/artifact/com.google.cloud.bigtable/bigtable-hbase-1.x-shaded/1.0.0].
  That artifact is required for our CloudBigtableIO implementation to coexist 
with BigtableIO.  For this specific issue, we might actually have a simple work 
around.

Long term, we need to consider the following:
 * Beam should upgrade grpc / protobuf versions.  Yes, it's difficult.  
However, having dependencies that are years out of date cause other issues.
 * CloudBigtableIO should be replaced with a new implementation that lives in 
the Beam repository.

> Update BigTable client version to 1.0
> -
>
> Key: BEAM-3412
> URL: https://issues.apache.org/jira/browse/BEAM-3412
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Chamikara Jayalath
>Assignee: Solomon Duskis
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2017-12-13 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289655#comment-16289655
 ] 

Solomon Duskis commented on BEAM-3342:
--

I started with a simple pipeline that writes to Cloud Bigtable via the 
google.cloud bigtable package, which works locally with google.cloud installed, 
but doesn't work when I use a dataflow runner.  Here's what I get:

==
  message:  "Not processing workitem 2633526545277283048 since a deferred 
exception was found: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", 
line 706, in run
self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", 
line 446, in _load_main_session
pickler.load_session(session_file)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 
247, in load_session
return dill.load_session(file_path)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in 
load_session
module = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
value = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in 
_import_module
return getattr(__import__(module, None, None, [obj]), obj)
AttributeError: 'module' object has no attribute 'bigtable'
==

Can I use the standard google.cloud bigtable client?  If so, how, and why don't 
BigQuery and Storage use the google.cloud clients?

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Ahmet Altay
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3342) Create a Cloud Bigtable Python connector

2017-12-13 Thread Solomon Duskis (JIRA)
Solomon Duskis created BEAM-3342:


 Summary: Create a Cloud Bigtable Python connector
 Key: BEAM-3342
 URL: https://issues.apache.org/jira/browse/BEAM-3342
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Solomon Duskis
Assignee: Ahmet Altay


I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3311) Extend BigTableIO to write Iterable of KV

2017-12-07 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283044#comment-16283044
 ] 

Solomon Duskis commented on BEAM-3311:
--

I definitely agree that larger bundles are important.  I would need help from 
the Beam team at large to figure out a general solution to this problem.

Here are some useful examples of controlling bundling using Beam constructs 
that I got from the Dataflow team that you can use to create a solution that 
would work in your specific case:

* Here is an example of how to use a stateful DoFn to buffer and pushback data 
here: https://beam.apache.org/blog/2017/08/28/timely-processing.html. Using a  
stateful DoFn will allow you to control exactly when data is output to 
BigtableIO but is more complicated to write and get correct.

* Alternatively, you can add a set of steps which will buffer data using a 
trigger.
PubSubIO -> ... original pipeline ... -> ParDo(Choose a random key in [0, 
1000)) -> Window.into(new 
GlobalWindows()).triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(10
 -> GBK -> Values -> BigtableIO

The logic behind the above pipeline is that your regrouping all your data into 
a fixed key space [0, 1000) in the global window and then attempting to write 
to BigtableIO every 10 seconds. This will cause you to get average bundles of: 
data output from original pipeline in 10 seconds / 1000 keys.

Good thing is that the transform needed to write is easy and you push all the 
buffering logic to the system instead of owning it. Bad thing is that your 
rewindowing which may not work depending on whether your writing windowing 
information to BigTable.

> Extend BigTableIO to write Iterable of KV 
> --
>
> Key: BEAM-3311
> URL: https://issues.apache.org/jira/browse/BEAM-3311
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Affects Versions: 2.2.0
>Reporter: Anna Smith
>Assignee: Solomon Duskis
>
> The motivation is to achieve qps as advertised in BigTable in Dataflow 
> streaming mode (ex: 300k qps for 30 node cluster).  Currently we aren't 
> seeing this as the bundle size is small in streaming mode and the requests 
> are overwhelmed by AuthentiationHeader.  For example, in order to achieve qps 
> advertised each payload is recommended to be ~1KB but without batching each 
> payload is 7KB, the majority of which is the authentication header.
> Currently BigTableIO supports DoFn>,...> 
> where batching is done per Bundle on flush in finishBundle. We would like to 
> be able to manually batch using a DoFn Iterable>>,...> so we can get around the small Bundle size in 
> streaming.  We have seen some improvements in qps to BigTable when running 
> with Dataflow using this approach.
> Initial thoughts on implementation would be to extend Write in order to have 
> a BulkWrite of Iterable>>.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3098) Upgrade Java grpc version

2017-12-07 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282659#comment-16282659
 ] 

Solomon Duskis commented on BEAM-3098:
--

grpc 1.7.0+ allows for tcnative to be shaded.  Cloud Bigtable does that now 
with our CloudBigtableIO client, but ideally, we should use the same grpc 
version as everyone else.

Which client libraries other than Cloud Bigtable shade away gRCP et al?

> Upgrade Java grpc version
> -
>
> Key: BEAM-3098
> URL: https://issues.apache.org/jira/browse/BEAM-3098
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Solomon Duskis
>
> Beam Java currently depends on grpc 1.2, which was released in March.  It 
> would be great if the dependency could be update to something newer, like 
> grpc 1.7.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-3311) Extend BigTableIO to write Iterable of KV

2017-12-07 Thread Solomon Duskis (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solomon Duskis reassigned BEAM-3311:


Assignee: Solomon Duskis  (was: Chamikara Jayalath)

> Extend BigTableIO to write Iterable of KV 
> --
>
> Key: BEAM-3311
> URL: https://issues.apache.org/jira/browse/BEAM-3311
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Affects Versions: 2.2.0
>Reporter: Anna Smith
>Assignee: Solomon Duskis
>
> The motivation is to achieve qps as advertised in BigTable in Dataflow 
> streaming mode (ex: 300k qps for 30 node cluster).  Currently we aren't 
> seeing this as the bundle size is small in streaming mode and the requests 
> are overwhelmed by AuthentiationHeader.  For example, in order to achieve qps 
> advertised each payload is recommended to be ~1KB but without batching each 
> payload is 7KB, the majority of which is the authentication header.
> Currently BigTableIO supports DoFn>,...> 
> where batching is done per Bundle on flush in finishBundle. We would like to 
> be able to manually batch using a DoFn Iterable>>,...> so we can get around the small Bundle size in 
> streaming.  We have seen some improvements in qps to BigTable when running 
> with Dataflow using this approach.
> Initial thoughts on implementation would be to extend Write in order to have 
> a BulkWrite of Iterable>>.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3154) Support multiple KeyRanges when reading from BigTable

2017-11-27 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267732#comment-16267732
 ] 

Solomon Duskis commented on BEAM-3154:
--

This is non trivial.  We probably won't get to it this year.

> Support multiple KeyRanges when reading from BigTable
> -
>
> Key: BEAM-3154
> URL: https://issues.apache.org/jira/browse/BEAM-3154
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Ryan Niemocienski
>Assignee: Solomon Duskis
>Priority: Minor
>
> BigTableIO.Read currently only supports reading one KeyRange from BT. It 
> would be nice to read multiple ranges from BigTable in one read. Thoughts on 
> the feasibility of this before I dig into it?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3246) BigtableIO should merge splits if they exceed 15K

2017-11-24 Thread Solomon Duskis (JIRA)
Solomon Duskis created BEAM-3246:


 Summary: BigtableIO should merge splits if they exceed 15K
 Key: BEAM-3246
 URL: https://issues.apache.org/jira/browse/BEAM-3246
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-gcp
Reporter: Solomon Duskis
Assignee: Solomon Duskis


A customer hit a problem with a large number of splits.  CloudBitableIO fixes 
that here 
https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-hbase-beam/src/main/java/com/google/cloud/bigtable/beam/CloudBigtableIO.java#L241

BigtableIO should have similar logic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector

2017-10-25 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219783#comment-16219783
 ] 

Solomon Duskis commented on BEAM-2955:
--

The problem is that Cloud Bigtable needs the following things:

# A different method for splitting.
# A different configuration mechanism for Cloud Bigtable specific 
configuration.  The configuration mechanism would also require the use of 
ValueProvider for templating purposes.
# A custom Cloud Bigtable oriented metric for expressing throttling.
# A custom way to use MultiRowRangeFilter (which is different between Cloud 
Bigtable and HBase)

There are probably other differences I'm missing.

A Service works for issue #1, but not for the rest.  There definitely is room 
for reuse, but I'm not sure if passing a Service to HBaseIO is the right way to 
do it.

> Create a Cloud Bigtable HBase connector
> ---
>
> Key: BEAM-2955
> URL: https://issues.apache.org/jira/browse/BEAM-2955
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>
> The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a 
> different repo for awhile. Recently, we did some reworking of the Cloud 
> Bigtable client that would allow it to better coexist in the Beam ecosystem, 
> and we also released a Beam connector in our repository that exposes HBase 
> idioms rather than the Protobuf idioms of BigtableIO.  More information about 
> the customer experience of the HBase connector can be found here: 
> [https://cloud.google.com/bigtable/docs/dataflow-hbase].
> The Beam repo is a much better place to house a Cloud Bigtable HBase 
> connector.  There are a couple of ways we can implement this new connector:
> # The CBT connector depends on artifacts in the io/hbase maven project.  We 
> can create a new extend HBaseIO for the purposes of CBT.  We would have to 
> add some features to HBaseIO to make that work (dynamic rebalancing, and a 
> way for HBase and CBT's size estimation models to coexist)
> # The BigtableIO connector works well, and we can add an adapter layer on top 
> of it.  I have a proof of concept of it here: 
> [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam].
> # We can build a separate CBT HBase connector.
> I'm happy to do the work.  I would appreciate some guidance and discussion 
> about the right approach.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3098) Upgrade Java grpc version

2017-10-25 Thread Solomon Duskis (JIRA)
Solomon Duskis created BEAM-3098:


 Summary: Upgrade Java grpc version
 Key: BEAM-3098
 URL: https://issues.apache.org/jira/browse/BEAM-3098
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Solomon Duskis
Assignee: Kenneth Knowles


Beam Java currently depends on grpc 1.2, which was released in March.  It would 
be great if the dependency could be update to something newer, like grpc 1.7.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3008) BigtableIO should use ValueProviders

2017-10-02 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188971#comment-16188971
 ] 

Solomon Duskis commented on BEAM-3008:
--

Cloud Bigtable is a constrained problem.  Writes needs a Cloud project id, 
instance id and table name.  Reads also need a scan.

HBaseIO currently takes in a Configuration object.  If there's a small set of 
HBase configuration key/value pairs then it's absolutely makes sense to have 
HBase specific configuration options.

HBaseIO and CloudBigtable need different configuration options.  I think that 
we can create an AbstractHBaseIO that defers Connection creation to a child 
which would have the more specific configuration options.

> BigtableIO should use ValueProviders 
> -
>
> Key: BEAM-3008
> URL: https://issues.apache.org/jira/browse/BEAM-3008
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>
> [https://github.com/apache/beam/pull/2057] is an effort towards BigtableIO 
> templatization.  This Issue is a request to get a fully featured template for 
> BigtableIO.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3008) BigtableIO should use ValueProviders

2017-10-02 Thread Solomon Duskis (JIRA)
Solomon Duskis created BEAM-3008:


 Summary: BigtableIO should use ValueProviders 
 Key: BEAM-3008
 URL: https://issues.apache.org/jira/browse/BEAM-3008
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-gcp
Reporter: Solomon Duskis
Assignee: Solomon Duskis


[https://github.com/apache/beam/pull/2057] is an effort towards BigtableIO 
templatization.  This Issue is a request to get a fully featured template for 
BigtableIO.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector

2017-09-13 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165523#comment-16165523
 ] 

Solomon Duskis commented on BEAM-2955:
--

Chamikra: HBaseIO will have to be extended or wrapped.  Cloud Bigtable needs 
slightly different configuration options, has a different way to calculate 
estimated sizes, and needs templating.  The interface would essentially be the 
same whether we leverage HBaseIO or BigtableIO.  The BigtableIO wrapper that I 
wrote was 271 lines of code.  

I'll create a PR for the BigtableIO wrapper in the Beam github project, since 
the code is already written.
I'll also create a PR for an extension of HBaseIO.

That way, we can compare the two options.

> Create a Cloud Bigtable HBase connector
> ---
>
> Key: BEAM-2955
> URL: https://issues.apache.org/jira/browse/BEAM-2955
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>
> The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a 
> different repo for awhile. Recently, we did some reworking of the Cloud 
> Bigtable client that would allow it to better coexist in the Beam ecosystem, 
> and we also released a Beam connector in our repository that exposes HBase 
> idioms rather than the Protobuf idioms of BigtableIO.  More information about 
> the customer experience of the HBase connector can be found here: 
> [https://cloud.google.com/bigtable/docs/dataflow-hbase].
> The Beam repo is a much better place to house a Cloud Bigtable HBase 
> connector.  There are a couple of ways we can implement this new connector:
> # The CBT connector depends on artifacts in the io/hbase maven project.  We 
> can create a new extend HBaseIO for the purposes of CBT.  We would have to 
> add some features to HBaseIO to make that work (dynamic rebalancing, and a 
> way for HBase and CBT's size estimation models to coexist)
> # The BigtableIO connector works well, and we can add an adapter layer on top 
> of it.  I have a proof of concept of it here: 
> [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam].
> # We can build a separate CBT HBase connector.
> I'm happy to do the work.  I would appreciate some guidance and discussion 
> about the right approach.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector

2017-09-13 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165431#comment-16165431
 ] 

Solomon Duskis commented on BEAM-2955:
--

It's awesome that you added the Dynamic rebalancing!  I'm ok with extending 
HBaseIO, as long as there aren't any other overriding concerns.  I'd like to 
explore the possibility of templates (ValueProviders) as the configuration of 
HBaseIO.

> Create a Cloud Bigtable HBase connector
> ---
>
> Key: BEAM-2955
> URL: https://issues.apache.org/jira/browse/BEAM-2955
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>
> The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a 
> different repo for awhile. Recently, we did some reworking of the Cloud 
> Bigtable client that would allow it to better coexist in the Beam ecosystem, 
> and we also released a Beam connector in our repository that exposes HBase 
> idioms rather than the Protobuf idioms of BigtableIO.  More information about 
> the customer experience of the HBase connector can be found here: 
> [https://cloud.google.com/bigtable/docs/dataflow-hbase].
> The Beam repo is a much better place to house a Cloud Bigtable HBase 
> connector.  There are a couple of ways we can implement this new connector:
> # The CBT connector depends on artifacts in the io/hbase maven project.  We 
> can create a new extend HBaseIO for the purposes of CBT.  We would have to 
> add some features to HBaseIO to make that work (dynamic rebalancing, and a 
> way for HBase and CBT's size estimation models to coexist)
> # The BigtableIO connector works well, and we can add an adapter layer on top 
> of it.  I have a proof of concept of it here: 
> [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam].
> # We can build a separate CBT HBase connector.
> I'm happy to do the work.  I would appreciate some guidance and discussion 
> about the right approach.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2955) Create a Cloud Bigtable HBase connector

2017-09-13 Thread Solomon Duskis (JIRA)
Solomon Duskis created BEAM-2955:


 Summary: Create a Cloud Bigtable HBase connector
 Key: BEAM-2955
 URL: https://issues.apache.org/jira/browse/BEAM-2955
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-gcp
Reporter: Solomon Duskis
Assignee: Chamikara Jayalath


The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a 
different repo for awhile. Recently, we did some reworking of the Cloud 
Bigtable client that would allow it to better coexist in the Beam ecosystem, 
and we also released a Beam connector in our repository that exposes HBase 
idioms rather than the Protobuf idioms of BigtableIO.  More information about 
the customer experience of the HBase connector can be found here: 
[https://cloud.google.com/bigtable/docs/dataflow-hbase].

The Beam repo is a much better place to house a Cloud Bigtable HBase connector. 
 There are a couple of ways we can implement this new connector:

# The CBT connector depends on artifacts in the io/hbase maven project.  We can 
create a new extend HBaseIO for the purposes of CBT.  We would have to add some 
features to HBaseIO to make that work (dynamic rebalancing, and a way for HBase 
and CBT's size estimation models to coexist)
# The BigtableIO connector works well, and we can add an adapter layer on top 
of it.  I have a proof of concept of it here: 
[https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam].
# We can build a separate CBT HBase connector.

I'm happy to do the work.  I would appreciate some guidance and discussion 
about the right approach.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row

2017-09-07 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157507#comment-16157507
 ] 

Solomon Duskis commented on BEAM-2545:
--

FYI, the Cloud Bigtable team released a version of CloudBigtableIO that works 
for Beam.  I need to open a new issue here to discuss the possibility of 
creating a Cloud Bigtable beam connector that uses HBase objects.  There are a 
few ways we can go with that, and a few potential pitfalls that ought to be 
discussed.

> bigtable e2e tests failing -  UNKNOWN: Stale requests/Error mutating row
> 
>
> Key: BEAM-2545
> URL: https://issues.apache.org/jira/browse/BEAM-2545
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Stephen Sisk
>Assignee: Chamikara Jayalath
>
> The BigtableWriteIT is taking a long time (~10min) and throwing errors. 
> Example test run: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
> Stacktrace
> java.lang.RuntimeException: 
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
>   at 
> org.apache.beam.run

[jira] [Comment Edited] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row

2017-09-07 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157488#comment-16157488
 ] 

Solomon Duskis edited comment on BEAM-2545 at 9/7/17 7:32 PM:
--

Yes.  1.0.0-pre3 is the proper version to choose.  It should fix that problem.

Users should be able to explicitly add 
com.google.cloud.bigtable:bigtable-client-core:1.0.0-pre3 to their maven/gradle 
configurations to fix this problem with Beam 2.1.0.


was (Author: sduskis):
Yes.  1.0.0-pre3 is the proper version to choose.  It should fix that problem.

Users should be able to explicitly add 
com.google.cloud.bigtabl:bigtable-client-core:1.0.0-pre3 to their maven/gradle 
configurations to fix this problem with Beam 2.1.0.

> bigtable e2e tests failing -  UNKNOWN: Stale requests/Error mutating row
> 
>
> Key: BEAM-2545
> URL: https://issues.apache.org/jira/browse/BEAM-2545
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Stephen Sisk
>Assignee: Chamikara Jayalath
>
> The BigtableWriteIT is taking a long time (~10min) and throwing errors. 
> Example test run: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
> Stacktrace
> java.lang.RuntimeException: 
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>   at 
> org

[jira] [Commented] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row

2017-09-07 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157488#comment-16157488
 ] 

Solomon Duskis commented on BEAM-2545:
--

Yes.  1.0.0-pre3 is the proper version to choose.  It should fix that problem.

Users should be able to explicitly add 
com.google.cloud.bigtabl:bigtable-client-core:1.0.0-pre3 to their maven/gradle 
configurations to fix this problem with Beam 2.1.0.

> bigtable e2e tests failing -  UNKNOWN: Stale requests/Error mutating row
> 
>
> Key: BEAM-2545
> URL: https://issues.apache.org/jira/browse/BEAM-2545
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Stephen Sisk
>Assignee: Chamikara Jayalath
>
> The BigtableWriteIT is taking a long time (~10min) and throwing errors. 
> Example test run: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
> Stacktrace
> java.lang.RuntimeException: 
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
>   at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:133)
>  

[jira] [Commented] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row

2017-06-29 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069316#comment-16069316
 ] 

Solomon Duskis commented on BEAM-2545:
--

I would suggest upgrading to the 1.0.0-pre1 release.   We did a complete 
overhaul of BulkMutation between 0.9.7.1 and 1.0.0-pre1.  We didn't see "stale 
requests" in our tests of 0.9.7.1, but we saw some stuckness under heavy load.  
1.0.0-pre1 didn't exhibit any of the problems we saw in earlier versions.

> bigtable e2e tests failing -  UNKNOWN: Stale requests/Error mutating row
> 
>
> Key: BEAM-2545
> URL: https://issues.apache.org/jira/browse/BEAM-2545
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: 2.1.0
>
>
> The BigtableWriteIT is taking a long time (~10min) and throwing errors. 
> Example test run: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
> Stacktrace
> java.lang.RuntimeException: 
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
>   at 
> org.ap

[jira] [Commented] (BEAM-2395) BigtableIO for Python SDK

2017-06-20 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056286#comment-16056286
 ] 

Solomon Duskis commented on BEAM-2395:
--

The python cloud bigtable client is missing some key features that the java 
client has relating to robustness and performance.  Specifically, the Java 
client has a "smart retries" feature that allows writes and reads to proceed 
despite temporary error conditions.  The python client also needs use the "bulk 
write" API for performance purposes. 

Without those features, a python Cloud Bigtable connector should not be 
considered ready for production.  FWIW, there are ongoing efforts to add those 
features.

> BigtableIO for Python SDK
> -
>
> Key: BEAM-2395
> URL: https://issues.apache.org/jira/browse/BEAM-2395
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Matthias Baetens
>Assignee: Matthias Baetens
>  Labels: features
>
> Developing a read and write IO for BigTable for the Python SDK. 
> Working / design document can be found here: 
> https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2181) Upgrade Bigtable dependency to 0.9.6.2

2017-05-05 Thread Solomon Duskis (JIRA)
Solomon Duskis created BEAM-2181:


 Summary: Upgrade Bigtable dependency to 0.9.6.2
 Key: BEAM-2181
 URL: https://issues.apache.org/jira/browse/BEAM-2181
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-gcp
Reporter: Solomon Duskis
Assignee: Daniel Halperin


Cloud Bigtable 0.9.6.2 has some fixes relating to:

1) Using dependencies for GCP protobuf objects rather than including generated 
artifacts directly in bigtable-protos
2) BulkMutation bug fixes
3) Auth token management
4) Using fewer grpc experimental features.

All are important in the context of beam, so the beam dependency should be 
upgraded.

One snag came up.  BigtableSession.isAlpnProviderEnabled() was removed in order 
to reduce the number of grpc experimental features.  
BigtableServiceImpl.tableExists() can no longer depend on 
isAlpnProviderEnabled().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1269) BigtableIO should make more efficient use of connections

2017-03-29 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948013#comment-15948013
 ] 

Solomon Duskis commented on BEAM-1269:
--

BigtableIO should not set data channel pool counts for reads.  This is the 
current line:

  // Set data channel count to one because there is only 1 scanner in this 
session
  BigtableOptions.Builder clonedBuilder = options.toBuilder()
  .setDataChannelCount(1);
  BigtableOptions optionsWithAgent =
  clonedBuilder.setUserAgent(getBeamSdkPartOfUserAgent()).build();

It should be more like:

 BigtableOptions optionsWithAgent = options
 .toBuilder()
 .setUserAgent(getBeamSdkPartOfUserAgent())
 . setUseCachedDataPool(true)
 . setDataHost(BigtableOptions.BIGTABLE_BATCH_DATA_HOST_DEFAULT)
 .build();


> BigtableIO should make more efficient use of connections
> 
>
> Key: BEAM-1269
> URL: https://issues.apache.org/jira/browse/BEAM-1269
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>  Labels: newbie, starter
>
> RIght now, {{BigtableIO}} opens up a new Bigtable session for every DoFn, in 
> the {{@Setup}} function. However, sessions can support multiple connections, 
> so perhaps this code should be modified to open up a smaller session pool and 
> then allocation connections in {{@StartBundle}}.
> This would likely make more efficient use of resources, especially for highly 
> multithreaded workers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1269) BigtableIO should make more efficient use of connections

2017-03-29 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947642#comment-15947642
 ] 

Solomon Duskis commented on BEAM-1269:
--

Cloud Bigtable client 0.9.6 was just released, and should be flowing through 
the maven repo process now.

This feature can be invoked via BigtableOptions.setUseCachedDataPool(true)

I have a follow up request to also set 
BigtableOptions.setDataHost(BigtableOptions.BIGTABLE_BATCH_DATA_HOST_DEFAULT) 
which will be a host dedicated to Batch type workloads like Dataflow.

> BigtableIO should make more efficient use of connections
> 
>
> Key: BEAM-1269
> URL: https://issues.apache.org/jira/browse/BEAM-1269
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>
> RIght now, {{BigtableIO}} opens up a new Bigtable session for every DoFn, in 
> the {{@Setup}} function. However, sessions can support multiple connections, 
> so perhaps this code should be modified to open up a smaller session pool and 
> then allocation connections in {{@StartBundle}}.
> This would likely make more efficient use of resources, especially for highly 
> multithreaded workers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)