[jira] [Created] (BEAM-5190) Python pipeline options are not picked correctly by PortableRunner

2018-08-22 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5190:
--

 Summary: Python pipeline options are not picked correctly by 
PortableRunner
 Key: BEAM-5190
 URL: https://issues.apache.org/jira/browse/BEAM-5190
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka


Python SDK worker is deserializing the pipeline options to dictionary instead 
of PipelineOptions

Sample log

[grpc-default-executor-2] INFO sdk_worker_main.main - Python sdk harness 
started with pipeline_options: \{u'beam:option:flink_master:v1': u'[auto]', 
u'beam:option:streaming:v1': False, u'beam:option:experiments:v1': 
[u'beam_fn_api', u'worker_threads=50'], u'beam:option:dry_run:v1': False, 
u'beam:option:runner:v1': None, u'beam:option:profile_memory:v1': False, 
u'beam:option:runtime_type_check:v1': False, u'beam:option:region:v1': 
u'us-central1', u'beam:option:options_id:v1': 1, u'beam:option:no_auth:v1': 
False, u'beam:option:dataflow_endpoint:v1': u'https://dataflow.googleapis.com', 
u'beam:option:sdk_location:v1': 
u'/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/dist/apache-beam-2.7.0.dev0.tar.gz',
 u'beam:option:direct_runner_use_stacked_bundle:v1': True, 
u'beam:option:save_main_session:v1': True, 
u'beam:option:type_check_strictness:v1': u'DEFAULT_TO_ANY', 
u'beam:option:profile_cpu:v1': False, u'beam:option:job_endpoint:v1': 
u'localhost:8099', u'beam:option:job_name:v1': 
u'BeamApp-goenka-0822071645-48ae1008', u'beam:option:temp_location:v1': 
u'gs://clouddfe-goenka/tmp/', u'beam:option:app_name:v1': None, 
u'beam:option:project:v1': u'google.com:clouddfe', 
u'beam:option:pipeline_type_check:v1': True, 
u'beam:option:staging_location:v1': u'gs://clouddfe-goenka/tmp/staging'} 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5190) Python pipeline options are not picked correctly by PortableRunner

2018-08-22 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588483#comment-16588483
 ] 

Ankur Goenka commented on BEAM-5190:


cc: [~thw] This blocks the worker_thread option on python SDKHarness.

> Python pipeline options are not picked correctly by PortableRunner
> --
>
> Key: BEAM-5190
> URL: https://issues.apache.org/jira/browse/BEAM-5190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Python SDK worker is deserializing the pipeline options to dictionary instead 
> of PipelineOptions
> Sample log
> [grpc-default-executor-2] INFO sdk_worker_main.main - Python sdk harness 
> started with pipeline_options: \{u'beam:option:flink_master:v1': u'[auto]', 
> u'beam:option:streaming:v1': False, u'beam:option:experiments:v1': 
> [u'beam_fn_api', u'worker_threads=50'], u'beam:option:dry_run:v1': False, 
> u'beam:option:runner:v1': None, u'beam:option:profile_memory:v1': False, 
> u'beam:option:runtime_type_check:v1': False, u'beam:option:region:v1': 
> u'us-central1', u'beam:option:options_id:v1': 1, u'beam:option:no_auth:v1': 
> False, u'beam:option:dataflow_endpoint:v1': 
> u'https://dataflow.googleapis.com', u'beam:option:sdk_location:v1': 
> u'/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/dist/apache-beam-2.7.0.dev0.tar.gz',
>  u'beam:option:direct_runner_use_stacked_bundle:v1': True, 
> u'beam:option:save_main_session:v1': True, 
> u'beam:option:type_check_strictness:v1': u'DEFAULT_TO_ANY', 
> u'beam:option:profile_cpu:v1': False, u'beam:option:job_endpoint:v1': 
> u'localhost:8099', u'beam:option:job_name:v1': 
> u'BeamApp-goenka-0822071645-48ae1008', u'beam:option:temp_location:v1': 
> u'gs://clouddfe-goenka/tmp/', u'beam:option:app_name:v1': None, 
> u'beam:option:project:v1': u'google.com:clouddfe', 
> u'beam:option:pipeline_type_check:v1': True, 
> u'beam:option:staging_location:v1': u'gs://clouddfe-goenka/tmp/staging'} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3026) Improve retrying in ElasticSearch client

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3026?focusedWorklogId=136853&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136853
 ]

ASF GitHub Bot logged work on BEAM-3026:


Author: ASF GitHub Bot
Created on: 22/Aug/18 07:36
Start Date: 22/Aug/18 07:36
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6146: 
[BEAM-3026] Adding retrying behavior on ElasticSearchIO
URL: https://github.com/apache/beam/pull/6146#discussion_r211655755
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java
 ##
 @@ -512,4 +529,53 @@ public Void apply(Iterable input) {
   return null;
 }
   }
+
+  /** Test that the default predicate correctly parses chosen error code. */
+  public void testDefaultRetryPredicate(RestClient restClient) throws 
IOException {
+String okRequest =
+"{ \"index\" : { \"_index\" : \"test\", \"_type\" : \"doc\", \"_id\" : 
\"1\" } }\n"
++ "{ \"field1\" : 1 }\n";
+String badRequest =
+"{ \"index\" : { \"_index\" : \"test\", \"_type\" : \"doc\", \"_id\" : 
\"1\" } }\n"
++ "{ \"field1\" : @ }\n";
+
+HttpEntity entity1 = new NStringEntity(badRequest, 
ContentType.APPLICATION_JSON);
+Response response1 =
+restClient.performRequest("POST", "/_bulk", Collections.emptyMap(), 
entity1);
+assertTrue(CUSTOM_RETRY_PREDICATE.test(response1));
+
+HttpEntity entity2 = new NStringEntity(okRequest, 
ContentType.APPLICATION_JSON);
+Response response2 =
+restClient.performRequest("POST", "/_bulk", Collections.emptyMap(), 
entity2);
+assertFalse(DEFAULT_RETRY_PREDICATE.test(response2));
+  }
+
+  /**
+   * Test that retries are invoked when Elasticsearch returns a specific error 
code. We invoke this
+   * by issuing corrupt data and retrying on the `400` error code. Normal 
behaviour is to retry on
+   * `429` only but that is difficult to simulate reliably. The logger is used 
to verify expected
+   * behavior.
+   */
+  public void testWriteRetry() throws Throwable {
+expectedException.expect(IOException.class);
+// max attempt is 3, but retry is 2 which excludes 1st attempt when error 
was identified and retry started.
+expectedException.expectMessage(
+String.format(ElasticsearchIO.Write.WriteFn.RETRY_FAILED_LOG, 
EXPECTED_RETRIES));
+
+String data[] = {"{ \"x\" :a,\"y\":\"ab\" }"};
+ElasticsearchIO.Write write =
+ElasticsearchIO.write()
+.withConnectionConfiguration(connectionConfiguration)
+.withRetryConfiguration(
+ElasticsearchIO.RetryConfiguration.create(MAX_ATTEMPTS, 
Duration.millis(35000))
+.withRetryPredicate(CUSTOM_RETRY_PREDICATE));
+pipeline.apply(Create.of(Arrays.asList(data))).apply(write);
+try {
+  pipeline.run();
+} catch (Exception ex) {
+  throw ex.getCause();
+}
+//when there is no exception, test should be failed
+fail();
 
 Review comment:
   can't you just remove the` try/catch` and the` fail() `and let the exception 
raise? the `expectedException` junit rule should make the test fail if no 
exception is thrown and make it succeed if an exception is thrown


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136853)
Time Spent: 12h 10m  (was: 12h)

> Improve retrying in ElasticSearch client
> 
>
> Key: BEAM-3026
> URL: https://issues.apache.org/jira/browse/BEAM-3026
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Tim Robertson
>Assignee: Ravi Pathak
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Currently an overloaded ES server will result in clients failing fast.
> I suggest implementing backoff pauses.  Perhaps something like this:
> {code}
> ElasticsearchIO.ConnectionConfiguration conn = 
> ElasticsearchIO.ConnectionConfiguration
>   .create(new String[]{"http://...:9200"}, "test", "test")
>   .retryWithWaitStrategy(WaitStrategies.exponentialBackoff(1000, 
> TimeUnit.MILLISECONDS)
>   .retryWithStopStrategy(StopStrategies.stopAfterAttempt(10)
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3026) Improve retrying in ElasticSearch client

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3026?focusedWorklogId=136854&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136854
 ]

ASF GitHub Bot logged work on BEAM-3026:


Author: ASF GitHub Bot
Created on: 22/Aug/18 07:36
Start Date: 22/Aug/18 07:36
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6146: 
[BEAM-3026] Adding retrying behavior on ElasticSearchIO
URL: https://github.com/apache/beam/pull/6146#discussion_r211653468
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java
 ##
 @@ -512,4 +529,53 @@ public Void apply(Iterable input) {
   return null;
 }
   }
+
+  /** Test that the default predicate correctly parses chosen error code. */
+  public void testDefaultRetryPredicate(RestClient restClient) throws 
IOException {
+String okRequest =
+"{ \"index\" : { \"_index\" : \"test\", \"_type\" : \"doc\", \"_id\" : 
\"1\" } }\n"
++ "{ \"field1\" : 1 }\n";
+String badRequest =
+"{ \"index\" : { \"_index\" : \"test\", \"_type\" : \"doc\", \"_id\" : 
\"1\" } }\n"
++ "{ \"field1\" : @ }\n";
+
+HttpEntity entity1 = new NStringEntity(badRequest, 
ContentType.APPLICATION_JSON);
+Response response1 =
+restClient.performRequest("POST", "/_bulk", Collections.emptyMap(), 
entity1);
+assertTrue(CUSTOM_RETRY_PREDICATE.test(response1));
+
+HttpEntity entity2 = new NStringEntity(okRequest, 
ContentType.APPLICATION_JSON);
+Response response2 =
+restClient.performRequest("POST", "/_bulk", Collections.emptyMap(), 
entity2);
+assertFalse(DEFAULT_RETRY_PREDICATE.test(response2));
+  }
+
+  /**
+   * Test that retries are invoked when Elasticsearch returns a specific error 
code. We invoke this
+   * by issuing corrupt data and retrying on the `400` error code. Normal 
behaviour is to retry on
+   * `429` only but that is difficult to simulate reliably. The logger is used 
to verify expected
+   * behavior.
+   */
+  public void testWriteRetry() throws Throwable {
+expectedException.expect(IOException.class);
+// max attempt is 3, but retry is 2 which excludes 1st attempt when error 
was identified and retry started.
+expectedException.expectMessage(
+String.format(ElasticsearchIO.Write.WriteFn.RETRY_FAILED_LOG, 
EXPECTED_RETRIES));
+
+String data[] = {"{ \"x\" :a,\"y\":\"ab\" }"};
+ElasticsearchIO.Write write =
+ElasticsearchIO.write()
+.withConnectionConfiguration(connectionConfiguration)
+.withRetryConfiguration(
+ElasticsearchIO.RetryConfiguration.create(MAX_ATTEMPTS, 
Duration.millis(35000))
 
 Review comment:
   use a const


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136854)
Time Spent: 12h 10m  (was: 12h)

> Improve retrying in ElasticSearch client
> 
>
> Key: BEAM-3026
> URL: https://issues.apache.org/jira/browse/BEAM-3026
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Tim Robertson
>Assignee: Ravi Pathak
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Currently an overloaded ES server will result in clients failing fast.
> I suggest implementing backoff pauses.  Perhaps something like this:
> {code}
> ElasticsearchIO.ConnectionConfiguration conn = 
> ElasticsearchIO.ConnectionConfiguration
>   .create(new String[]{"http://...:9200"}, "test", "test")
>   .retryWithWaitStrategy(WaitStrategies.exponentialBackoff(1000, 
> TimeUnit.MILLISECONDS)
>   .retryWithStopStrategy(StopStrategies.stopAfterAttempt(10)
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3026) Improve retrying in ElasticSearch client

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3026?focusedWorklogId=136855&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136855
 ]

ASF GitHub Bot logged work on BEAM-3026:


Author: ASF GitHub Bot
Created on: 22/Aug/18 07:36
Start Date: 22/Aug/18 07:36
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6146: 
[BEAM-3026] Adding retrying behavior on ElasticSearchIO
URL: https://github.com/apache/beam/pull/6146#discussion_r211855088
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java
 ##
 @@ -512,4 +527,44 @@ public Void apply(Iterable input) {
   return null;
 }
   }
+
+  /** Test that the default predicate correctly parses chosen error code. */
+  public void testDefaultRetryPredicate(RestClient restClient) throws 
IOException {
+assertFalse(DEFAULT_RETRY_PREDICATE.test(new IOException("test")));
+String x =
+"{ \"index\" : { \"_index\" : \"test\", \"_type\" : \"doc\", \"_id\" : 
\"1\" } }\n"
++ "{ \"field1\" : @ }\n";
+HttpEntity entity = new NStringEntity(x, ContentType.APPLICATION_JSON);
+
+Response response = restClient.performRequest("POST", "/_bulk", 
Collections.emptyMap(), entity);
+assertTrue(CUSTOM_RETRY_PREDICATE.test(new ResponseException(response)));
+  }
+
+  /**
+   * Test that retries are invoked when Elasticsearch returns a specific error 
code. We invoke this
+   * by issuing corrupt data and retrying on the `400` error code. Normal 
behaviour is to retry on
+   * `429` only but that is difficult to simulate reliably. The logger is used 
to verify expected
 
 Review comment:
   Indeed very difficult to simulate with an http 400 bad formatted doc. The 
batch will be retried until maxAttempts is reached because it will still 
contain a bad formatted doc. The only way would be to test with an error that 
can be temporary (like an ES status) and control the embeded ES through a 
separate thread in the UT to change that status. But now that you changed the 
logic in `handleRetry()` and considering that you test your predicate in a 
separate UT, I think it is fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136855)
Time Spent: 12h 20m  (was: 12h 10m)

> Improve retrying in ElasticSearch client
> 
>
> Key: BEAM-3026
> URL: https://issues.apache.org/jira/browse/BEAM-3026
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Tim Robertson
>Assignee: Ravi Pathak
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Currently an overloaded ES server will result in clients failing fast.
> I suggest implementing backoff pauses.  Perhaps something like this:
> {code}
> ElasticsearchIO.ConnectionConfiguration conn = 
> ElasticsearchIO.ConnectionConfiguration
>   .create(new String[]{"http://...:9200"}, "test", "test")
>   .retryWithWaitStrategy(WaitStrategies.exponentialBackoff(1000, 
> TimeUnit.MILLISECONDS)
>   .retryWithStopStrategy(StopStrategies.stopAfterAttempt(10)
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5184) Multimap side inputs with duplicate keys and values are being lost

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5184?focusedWorklogId=136860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136860
 ]

ASF GitHub Bot logged work on BEAM-5184:


Author: ASF GitHub Bot
Created on: 22/Aug/18 08:12
Start Date: 22/Aug/18 08:12
Worklog Time Spent: 10m 
  Work Description: VaclavPlajt commented on issue #6257: [BEAM-5184] 
Multimap side inputs with duplicate keys and values are being lost
URL: https://github.com/apache/beam/pull/6257#issuecomment-414949954
 
 
   It seems that Google Cloud Dataflow Runner did not pass one of the updated 
tests. I tried with direct runner and it pass. Can somebody check the test?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136860)
Time Spent: 1h 20m  (was: 1h 10m)

> Multimap side inputs with duplicate keys and values are being lost
> --
>
> Key: BEAM-5184
> URL: https://issues.apache.org/jira/browse/BEAM-5184
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Side inputs with duplicate values are being lost due to the usage of a set 
> based multimap.
> [https://github.com/apache/beam/blob/05fb694f265dda0254d7256e938e508fec9ba098/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L293]
>  
> Originating thread: 
> [https://lists.apache.org/thread.html/48bae7cf71bf6851622cdee0e8bc8619c79c4c2273ed63f288202169@%3Cdev.beam.apache.org%3E]
>  
> Please update the existing tests to exercise this scenario as well: 
> https://github.com/apache/beam/blob/9f23ffc97535e7255245f3852b9d2f0939df5a0a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L507



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136863
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 08:15
Start Date: 22/Aug/18 08:15
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #6211: [BEAM-5107] Support 
ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#issuecomment-414950773
 
 
   @dattran-vn01 @timrobertson100 starting the review


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136863)
Time Spent: 5h 20m  (was: 5h 10m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5124) Write Euphoria in Beam documentation

2018-08-22 Thread Vaclav Plajt (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588527#comment-16588527
 ] 

Vaclav Plajt commented on BEAM-5124:


To Apache Beam-site pull request: https://github.com/apache/beam-site/pull/540

> Write Euphoria in Beam documentation
> 
>
> Key: BEAM-5124
> URL: https://issues.apache.org/jira/browse/BEAM-5124
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5077) Use type informations about output, key and values from operators to set coders

2018-08-22 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt closed BEAM-5077.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Use type informations about output, key and values from operators to set 
> coders
> ---
>
> Key: BEAM-5077
> URL: https://issues.apache.org/jira/browse/BEAM-5077
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
> Fix For: Not applicable
>
>
> Operators can (optionally) carry `TypeDescriptor` of its output, key and 
> value types. These can be used to determine right coders for `PCollection`s 
> during translation to Beam API.
> Another enhancements of coders resolution are desired:
>  * Allow API user to disable use of `KryoCoder` as fallback coder.
>  * Every automatically created `KryoCoder` needs to respects classes 
> registrations provided by user.
>  * Automate creation of 'PairCoder' whenever needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5099) Replace Euphoria's Pair with Beam's KV

2018-08-22 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt closed BEAM-5099.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Replace Euphoria's Pair with Beam's KV
> --
>
> Key: BEAM-5099
> URL: https://issues.apache.org/jira/browse/BEAM-5099
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
> Fix For: Not applicable
>
>
> Euphoria's `org.apache.beam.sdk.extensions.euphoria.core.client.util.Pair` 
> and Beam's `org.apache.beam.sdk.values.KV` are in fact the same objects. 
> Delete `Pair` and use `KV` in Beamphoria instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5123) Testsuite do not run all tests it should

2018-08-22 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt closed BEAM-5123.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Testsuite do not run all tests it should
> 
>
> Key: BEAM-5123
> URL: https://issues.apache.org/jira/browse/BEAM-5123
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
> Fix For: Not applicable
>
>
> For example `...testkit.JoinTest` is ignored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4851) Allow all operators to carry type descriptors of its element

2018-08-22 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt closed BEAM-4851.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Allow all operators to carry type descriptors of its element
> 
>
> Key: BEAM-4851
> URL: https://issues.apache.org/jira/browse/BEAM-4851
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
> Fix For: Not applicable
>
>
> We need to have an source of `TypeDesctiptor` for each processed element 
> type in order to set/obtain right coder for them. So each operator should be 
> able to tell what is a type of its output, key and value (if applicable).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5160) Fix failing `ReduceWindow` test

2018-08-22 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt closed BEAM-5160.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Fix failing `ReduceWindow` test
> ---
>
> Key: BEAM-5160
> URL: https://issues.apache.org/jira/browse/BEAM-5160
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
> Fix For: Not applicable
>
>
> `...testkit.ReduceWindowTest` contains one failing now ignored test test. Fix 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136865&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136865
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 08:28
Start Date: 22/Aug/18 08:28
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #6211: [BEAM-5107] Support 
ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#issuecomment-414954428
 
 
   @dattran-vn01 regarding your commit history, please do not merge master into 
your feature branch. Instead rebase your branch onto master to have a clean 
history. See beam contribution guide: we prefer not having parasite merge 
commits in the history. The only merge commits allowed are the ones that merge 
PRs into master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136865)
Time Spent: 5.5h  (was: 5h 20m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5190) Python pipeline options are not picked correctly by PortableRunner

2018-08-22 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588549#comment-16588549
 ] 

Ankur Goenka commented on BEAM-5190:


Options are rewritten to URN causing the issue.

https://github.com/apache/beam/blob/7c41e0a915083bd3b1fe52c2a417fa38a00e6463/sdks/python/apache_beam/runners/portability/portable_runner.py#L107

> Python pipeline options are not picked correctly by PortableRunner
> --
>
> Key: BEAM-5190
> URL: https://issues.apache.org/jira/browse/BEAM-5190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Python SDK worker is deserializing the pipeline options to dictionary instead 
> of PipelineOptions
> Sample log
> [grpc-default-executor-2] INFO sdk_worker_main.main - Python sdk harness 
> started with pipeline_options: \{u'beam:option:flink_master:v1': u'[auto]', 
> u'beam:option:streaming:v1': False, u'beam:option:experiments:v1': 
> [u'beam_fn_api', u'worker_threads=50'], u'beam:option:dry_run:v1': False, 
> u'beam:option:runner:v1': None, u'beam:option:profile_memory:v1': False, 
> u'beam:option:runtime_type_check:v1': False, u'beam:option:region:v1': 
> u'us-central1', u'beam:option:options_id:v1': 1, u'beam:option:no_auth:v1': 
> False, u'beam:option:dataflow_endpoint:v1': 
> u'https://dataflow.googleapis.com', u'beam:option:sdk_location:v1': 
> u'/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/dist/apache-beam-2.7.0.dev0.tar.gz',
>  u'beam:option:direct_runner_use_stacked_bundle:v1': True, 
> u'beam:option:save_main_session:v1': True, 
> u'beam:option:type_check_strictness:v1': u'DEFAULT_TO_ANY', 
> u'beam:option:profile_cpu:v1': False, u'beam:option:job_endpoint:v1': 
> u'localhost:8099', u'beam:option:job_name:v1': 
> u'BeamApp-goenka-0822071645-48ae1008', u'beam:option:temp_location:v1': 
> u'gs://clouddfe-goenka/tmp/', u'beam:option:app_name:v1': None, 
> u'beam:option:project:v1': u'google.com:clouddfe', 
> u'beam:option:pipeline_type_check:v1': True, 
> u'beam:option:staging_location:v1': u'gs://clouddfe-goenka/tmp/staging'} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3026) Improve retrying in ElasticSearch client

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3026?focusedWorklogId=136867&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136867
 ]

ASF GitHub Bot logged work on BEAM-3026:


Author: ASF GitHub Bot
Created on: 22/Aug/18 08:36
Start Date: 22/Aug/18 08:36
Worklog Time Spent: 10m 
  Work Description: aalbatross commented on issue #6146: [BEAM-3026] Adding 
retrying behavior on ElasticSearchIO
URL: https://github.com/apache/beam/pull/6146#issuecomment-414956841
 
 
   @echauchot 
   I updated the changes
   a. added constants for test cases
   b. removed fail(), could not remove try-catch block as i need to assert 
against cause of exception (which is IOException), so keeping it unchanged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136867)
Time Spent: 12.5h  (was: 12h 20m)

> Improve retrying in ElasticSearch client
> 
>
> Key: BEAM-3026
> URL: https://issues.apache.org/jira/browse/BEAM-3026
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Tim Robertson
>Assignee: Ravi Pathak
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Currently an overloaded ES server will result in clients failing fast.
> I suggest implementing backoff pauses.  Perhaps something like this:
> {code}
> ElasticsearchIO.ConnectionConfiguration conn = 
> ElasticsearchIO.ConnectionConfiguration
>   .create(new String[]{"http://...:9200"}, "test", "test")
>   .retryWithWaitStrategy(WaitStrategies.exponentialBackoff(1000, 
> TimeUnit.MILLISECONDS)
>   .retryWithStopStrategy(StopStrategies.stopAfterAttempt(10)
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588555#comment-16588555
 ] 

Etienne Chauchot commented on BEAM-5107:


A general comment: this ticket is the 4th ticket related to ES v6 upgrade 3 of 
them are duplicates. Please search the tickets before opening a new one. 

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588555#comment-16588555
 ] 

Etienne Chauchot edited comment on BEAM-5107 at 8/22/18 8:39 AM:
-

[~dattran.vn01] A general comment: this ticket is the 4th ticket related to ES 
v6 upgrade 3 of them are duplicates. Please search the tickets before opening a 
new one. 


was (Author: echauchot):
A general comment: this ticket is the 4th ticket related to ES v6 upgrade 3 of 
them are duplicates. Please search the tickets before opening a new one. 

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread Dat Tran (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588572#comment-16588572
 ] 

Dat Tran commented on BEAM-5107:


Thank you. [~echauchot]
I'm sorry this is the first time I contributed to open source so I did not 
aware of that.
About my PR, I think I should close it and create new PR without merging master 
into my feature branch.

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136871
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 08:47
Start Date: 22/Aug/18 08:47
Worklog Time Spent: 10m 
  Work Description: dattran-vn01 commented on issue #6211: [BEAM-5107] 
Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#issuecomment-414959899
 
 
   Thank you. @echauchot Let me close this PR and create a new one without 
merging master into my feature branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136871)
Time Spent: 5h 40m  (was: 5.5h)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136872
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 08:48
Start Date: 22/Aug/18 08:48
Worklog Time Spent: 10m 
  Work Description: dattran-vn01 edited a comment on issue #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#issuecomment-414959899
 
 
   Thank you @echauchot Let me close this PR and create a new one without 
merging master into my feature branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136872)
Time Spent: 5h 50m  (was: 5h 40m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136873
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 08:48
Start Date: 22/Aug/18 08:48
Worklog Time Spent: 10m 
  Work Description: dattran-vn01 edited a comment on issue #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#issuecomment-414959899
 
 
   Thank you @echauchot Let me close this PR and create a new one without 
merging master into my feature branch. I will do it in 4 hours. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136873)
Time Spent: 6h  (was: 5h 50m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3199) Upgrade to Elasticsearch 6.x

2018-08-22 Thread Etienne Chauchot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-3199.
--
   Resolution: Duplicate
Fix Version/s: Not applicable

I'm closing this ticket because another ES6 upgrade ticket was opened 
(https://issues.apache.org/jira/browse/BEAM-5107) and a PR related to this 
ticket was submitted.

[~jeroens], if you want to submit a complementary PR we could then reopen the 
current ticket and update the subject a bit.

> Upgrade to Elasticsearch 6.x
> 
>
> Key: BEAM-3199
> URL: https://issues.apache.org/jira/browse/BEAM-3199
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Jean-Baptiste Onofré
>Assignee: Jeroen Steggink
>Priority: Major
> Fix For: Not applicable
>
>
> Elasticsearch 6.x is now GA. As it's fully compatible with Elasticsearch 5.x, 
> it makes sense to upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4435) Add a jenkins job for ElasticsearchIOIT v5

2018-08-22 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588574#comment-16588574
 ] 

Etienne Chauchot commented on BEAM-4435:


[~timrobertson100] as you offered help :) Thanks

> Add a jenkins job for ElasticsearchIOIT v5
> --
>
> Key: BEAM-4435
> URL: https://issues.apache.org/jira/browse/BEAM-4435
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Etienne Chauchot
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4434) Add a jenkins job for ElasticsearchIOIT v2

2018-08-22 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588575#comment-16588575
 ] 

Etienne Chauchot commented on BEAM-4434:


[~timrobertson100] as you offered help :) Thanks

> Add a jenkins job for ElasticsearchIOIT v2
> --
>
> Key: BEAM-4434
> URL: https://issues.apache.org/jira/browse/BEAM-4434
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Etienne Chauchot
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136875
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 08:54
Start Date: 22/Aug/18 08:54
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #6211: [BEAM-5107] Support 
ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#issuecomment-414961755
 
 
   @dattran-vn01 no need to close, just rebase and we will squash the commits 
in the end. Please do not close otherwise we lose review history


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136875)
Time Spent: 6h 10m  (was: 6h)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136876&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136876
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 08:55
Start Date: 22/Aug/18 08:55
Worklog Time Spent: 10m 
  Work Description: echauchot edited a comment on issue #6211: [BEAM-5107] 
Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#issuecomment-414961755
 
 
   @dattran-vn01 no need to close, just rebase and we will squash the commits 
in the end. Please do not close otherwise we lose review history. Fell free to 
ping me on slack for synchronous communication


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136876)
Time Spent: 6h 20m  (was: 6h 10m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5191) Add support for writing to BigQuery clustered tables

2018-08-22 Thread Robert Sahlin (JIRA)
Robert Sahlin created BEAM-5191:
---

 Summary: Add support for writing to BigQuery clustered tables
 Key: BEAM-5191
 URL: https://issues.apache.org/jira/browse/BEAM-5191
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Affects Versions: 2.6.0
Reporter: Robert Sahlin
Assignee: Chamikara Jayalath


Google recently added support for clustered tables in BigQuery. It would be 
useful to set clustering columns the same way as for partitioning. It should 
support multiple fields (4) for clustering.

For example:

[BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]>
 .withClustering(new Clustering().setField("productId").setType("STRING"))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread Alexey Romanenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-3654:
--

Assignee: Reuven Lax

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Reuven Lax
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5192) Support ES V7.

2018-08-22 Thread Etienne Chauchot (JIRA)
Etienne Chauchot created BEAM-5192:
--

 Summary: Support ES V7.
 Key: BEAM-5192
 URL: https://issues.apache.org/jira/browse/BEAM-5192
 Project: Beam
  Issue Type: Improvement
  Components: io-java-elasticsearch
Reporter: Etienne Chauchot
Assignee: Tim Robertson


ES v7 is not out yet. But Elastic team scheduled a breaking change for ES 7.0: 
the removal of the type feature. See 

[https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch]

This will require a good amont of changes in the IO. 

This ticket is there to track the future work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread Alexey Romanenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-3654:
--

Assignee: (was: Reuven Lax)

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread Alexey Romanenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-3654:
--

Assignee: Reuven Lax

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Reuven Lax
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread Alexey Romanenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-3654:
--

Assignee: (was: Reuven Lax)

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-3654:
--

Assignee: Mikhail

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Mikhail
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5192) Support ES V7

2018-08-22 Thread Etienne Chauchot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-5192:
---
Summary: Support ES V7  (was: Support ES V7.)

> Support ES V7
> -
>
> Key: BEAM-5192
> URL: https://issues.apache.org/jira/browse/BEAM-5192
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Tim Robertson
>Priority: Major
>
> ES v7 is not out yet. But Elastic team scheduled a breaking change for ES 
> 7.0: the removal of the type feature. See 
> [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch]
> This will require a good amont of changes in the IO. 
> This ticket is there to track the future work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5192) Support ES V7

2018-08-22 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588653#comment-16588653
 ] 

Etienne Chauchot commented on BEAM-5192:


[~timrobertson100] I assigned this ticket to you regarding our conversation of 
yesterday

> Support ES V7
> -
>
> Key: BEAM-5192
> URL: https://issues.apache.org/jira/browse/BEAM-5192
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Tim Robertson
>Priority: Major
>
> ES v7 is not out yet. But Elastic team scheduled a breaking change for ES 
> 7.0: the removal of the type feature. See 
> [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch]
> This will require a good amont of changes in the IO. 
> This ticket is there to track the future work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5193) KuduIO testWrite not correctly verifying behaviour

2018-08-22 Thread Tim Robertson (JIRA)
Tim Robertson created BEAM-5193:
---

 Summary: KuduIO testWrite not correctly verifying behaviour
 Key: BEAM-5193
 URL: https://issues.apache.org/jira/browse/BEAM-5193
 Project: Beam
  Issue Type: Bug
  Components: io-ideas
Affects Versions: 2.6.0
Reporter: Tim Robertson
Assignee: Tim Robertson
 Fix For: 2.7.0


The {{testWrite}} in {{KuduIOTest}} has 2 typos which were committed in error.

The following code block

{code:java}
for (int i = 1; i <= targetParallelism + 1; i++) {
  expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, 
i));
  expectedWriteLogs.verifyDebug(
  String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
  expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, 
i));
}
// verify all entries written
for (int n = 0; n > numberRecords; n++) {
  expectedWriteLogs.verifyDebug(
  String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
writer
}
{code}
 
Should have read:
{code:java}
for (int i = 1; i <= targetParallelism ; i++) {
  expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, 
i));
  expectedWriteLogs.verifyDebug(
  String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
  expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, 
i));
}
// verify all entries written
for (int n = 0; n < numberRecords; n++) {
  expectedWriteLogs.verifyDebug(
  String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
writer
}
{code}

This has gone unnoticed because 1) the {{targetParallelism}} is a function of 
cores available, and the test uses 3 only (which is the min in 
{{DirectRunner}}) and 2) the incorrect {{>}} in the test simply meant the loop 
was not run.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588659#comment-16588659
 ] 

Tim Robertson commented on BEAM-3654:
-

My apologies for the Kudu error.

I have lodged and will fix https://issues.apache.org/jira/browse/BEAM-5193 now 
which was simply a mistake.

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Mikhail
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5194) Pipeline options with multi value are not deserialized correctly from map

2018-08-22 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5194:
--

 Summary: Pipeline options with multi value are not deserialized 
correctly from map
 Key: BEAM-5194
 URL: https://issues.apache.org/jira/browse/BEAM-5194
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ahmet Altay


[https://github.com/apache/beam/blob/7c41e0a915083bd3b1fe52c2a417fa38a00e6463/sdks/python/apache_beam/options/pipeline_options.py#L171]

 

Multiple options are converted to strings and added to flags which causes wrong 
deserialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3026) Improve retrying in ElasticSearch client

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3026?focusedWorklogId=136884&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136884
 ]

ASF GitHub Bot logged work on BEAM-3026:


Author: ASF GitHub Bot
Created on: 22/Aug/18 10:22
Start Date: 22/Aug/18 10:22
Worklog Time Spent: 10m 
  Work Description: aalbatross edited a comment on issue #6146: [BEAM-3026] 
Adding retrying behavior on ElasticSearchIO
URL: https://github.com/apache/beam/pull/6146#issuecomment-414956841
 
 
   @echauchot 
   I updated the changes
   a. added constants for test cases
   b. removed fail(), could not remove try-catch block as i need to assert 
against cause of exception (which is IOException), so keeping it unchanged.
   
   If there is a way i can start IT on jenkins ?
   I ran IT locally on my laptop sharing the screen shot here. 
   https://user-images.githubusercontent.com/1793534/44458189-b9f9c000-a605-11e8-85ed-f45a0d6c73ee.png";>
   https://user-images.githubusercontent.com/1793534/44458190-b9f9c000-a605-11e8-8043-41b2b3f9a084.png";>
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136884)
Time Spent: 12h 40m  (was: 12.5h)

> Improve retrying in ElasticSearch client
> 
>
> Key: BEAM-3026
> URL: https://issues.apache.org/jira/browse/BEAM-3026
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Tim Robertson
>Assignee: Ravi Pathak
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Currently an overloaded ES server will result in clients failing fast.
> I suggest implementing backoff pauses.  Perhaps something like this:
> {code}
> ElasticsearchIO.ConnectionConfiguration conn = 
> ElasticsearchIO.ConnectionConfiguration
>   .create(new String[]{"http://...:9200"}, "test", "test")
>   .retryWithWaitStrategy(WaitStrategies.exponentialBackoff(1000, 
> TimeUnit.MILLISECONDS)
>   .retryWithStopStrategy(StopStrategies.stopAfterAttempt(10)
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5194) Pipeline options with multi value are not deserialized correctly from map

2018-08-22 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-5194:
--

Assignee: Ankur Goenka  (was: Ahmet Altay)

> Pipeline options with multi value are not deserialized correctly from map
> -
>
> Key: BEAM-5194
> URL: https://issues.apache.org/jira/browse/BEAM-5194
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> [https://github.com/apache/beam/blob/7c41e0a915083bd3b1fe52c2a417fa38a00e6463/sdks/python/apache_beam/options/pipeline_options.py#L171]
>  
> Multiple options are converted to strings and added to flags which causes 
> wrong deserialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5194) Pipeline options with multi value are not deserialized correctly from map

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5194?focusedWorklogId=136885&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136885
 ]

ASF GitHub Bot logged work on BEAM-5194:


Author: ASF GitHub Bot
Created on: 22/Aug/18 10:24
Start Date: 22/Aug/18 10:24
Worklog Time Spent: 10m 
  Work Description: angoenka opened a new pull request #6262: [BEAM-5194] 
Fix Pipeline options with multi value deserialization from map
URL: https://github.com/apache/beam/pull/6262
 
 
   Multiple options are converted to strings and added to flags which causes 
wrong deserialization.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136885)
Time Spent: 10m
Remaining Estimate: 0h

> Pipeline options with multi value are not deserialized correctly from map
> -
>
> Key: BEAM-5194
> URL: https://issues.apache.org/jira/browse/BEAM-5194
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> 

[jira] [Work logged] (BEAM-5193) KuduIO testWrite not correctly verifying behaviour

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5193?focusedWorklogId=136888&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136888
 ]

ASF GitHub Bot logged work on BEAM-5193:


Author: ASF GitHub Bot
Created on: 22/Aug/18 10:37
Start Date: 22/Aug/18 10:37
Worklog Time Spent: 10m 
  Work Description: timrobertson100 opened a new pull request #6263: 
[BEAM-5193] Correct typos in KuduIOTest testWrite
URL: https://github.com/apache/beam/pull/6263
 
 
   Corrects some typos which slipped in to the original Kudu commit in error 
and meant the kudu testWrite() was not doing what it was intended to do. 
   
   1) It has gone unnoticed presumably because test environments have had more 
than 3 processor cores. The intended design was to ensure only at least 3 which 
is the minimum parallelism that the DirectRunner will provide. 
   
   2) The second loop here was just a typo and never executed. This corrects 
that to verify behaviour.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136888)
Time Spent: 10m
Remaining Estimate: 0h

> KuduIO testWrite not correctly verifying behaviour
> --

[jira] [Work logged] (BEAM-5193) KuduIO testWrite not correctly verifying behaviour

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5193?focusedWorklogId=136889&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136889
 ]

ASF GitHub Bot logged work on BEAM-5193:


Author: ASF GitHub Bot
Created on: 22/Aug/18 10:37
Start Date: 22/Aug/18 10:37
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6263: [BEAM-5193] 
Correct typos in KuduIOTest testWrite
URL: https://github.com/apache/beam/pull/6263#issuecomment-414989146
 
 
   R: @iemejia 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136889)
Time Spent: 20m  (was: 10m)

> KuduIO testWrite not correctly verifying behaviour
> --
>
> Key: BEAM-5193
> URL: https://issues.apache.org/jira/browse/BEAM-5193
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Affects Versions: 2.6.0
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The {{testWrite}} in {{KuduIOTest}} has 2 typos which were committed in error.
> The following code block
> {code:java}
> for (int i = 1; i <= targetParallelism + 1; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n > numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
>  
> Should have read:
> {code:java}
> for (int i = 1; i <= targetParallelism ; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n < numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
> This has gone unnoticed because 1) the {{targetParallelism}} is a function of 
> cores available, and the test uses 3 only (which is the min in 
> {{DirectRunner}}) and 2) the incorrect {{>}} in the test simply meant the 
> loop was not run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5193) KuduIO testWrite not correctly verifying behaviour

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5193?focusedWorklogId=136897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136897
 ]

ASF GitHub Bot logged work on BEAM-5193:


Author: ASF GitHub Bot
Created on: 22/Aug/18 11:22
Start Date: 22/Aug/18 11:22
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #6263: [BEAM-5193] Correct 
typos in KuduIOTest testWrite
URL: https://github.com/apache/beam/pull/6263#issuecomment-414999384
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136897)
Time Spent: 0.5h  (was: 20m)

> KuduIO testWrite not correctly verifying behaviour
> --
>
> Key: BEAM-5193
> URL: https://issues.apache.org/jira/browse/BEAM-5193
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Affects Versions: 2.6.0
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The {{testWrite}} in {{KuduIOTest}} has 2 typos which were committed in error.
> The following code block
> {code:java}
> for (int i = 1; i <= targetParallelism + 1; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n > numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
>  
> Should have read:
> {code:java}
> for (int i = 1; i <= targetParallelism ; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n < numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
> This has gone unnoticed because 1) the {{targetParallelism}} is a function of 
> cores available, and the test uses 3 only (which is the min in 
> {{DirectRunner}}) and 2) the incorrect {{>}} in the test simply meant the 
> loop was not run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2277) IllegalArgumentException when using Hadoop file system for WordCount example.

2018-08-22 Thread Jozef Vilcek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588725#comment-16588725
 ] 

Jozef Vilcek commented on BEAM-2277:


[~timrobertson100] I believe root cause and solution to this is BEAM-5180 and 
PR linked there.

> IllegalArgumentException when using Hadoop file system for WordCount example.
> -
>
> Key: BEAM-2277
> URL: https://issues.apache.org/jira/browse/BEAM-2277
> Project: Beam
>  Issue Type: Bug
>  Components: z-do-not-use-sdk-java-extensions
>Affects Versions: 2.6.0
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Blocker
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> IllegalArgumentException when using Hadoop file system for WordCount example.
> Occurred when running WordCount example using Spark runner on a YARN cluster.
> Command-line arguments:
> {code:none}
> --runner=SparkRunner --inputFile=hdfs:///user/myuser/kinglear.txt 
> --output=hdfs:///user/myuser/wc/wc
> {code}
> Stack trace:
> {code:none}
> java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds 
> have the same scheme, but received file, hdfs.
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:394)
>   at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:236)
>   at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles(FileBasedSink.java:626)
>   at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBasedSink.java:516)
>   at 
> org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:592)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostRelease_NightlySnapshot #348

2018-08-22 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-22 Thread Jozef Vilcek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588727#comment-16588727
 ] 

Jozef Vilcek commented on BEAM-5180:


Any though on this? Is this a non valid URI or ResourceId {{hdfs:/ }}? Given 
that authority component is optional, extra {{//}} can be dropped. 
{{java.net.URI}} parse that just fine.

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3919) checkpoint can not work with flink 1.4.1,1.4.2

2018-08-22 Thread Aljoscha Krettek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588744#comment-16588744
 ] 

Aljoscha Krettek commented on BEAM-3919:


Unfortunately, I don't think we can fix this right now with the incompatible 
changes in minor Flink versions.

> checkpoint can not work with flink 1.4.1,1.4.2
> --
>
> Key: BEAM-3919
> URL: https://issues.apache.org/jira/browse/BEAM-3919
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.3.0, 2.4.0
>Reporter: eisig
>Assignee: Harshal Tripathi
>Priority: Critical
>
> When submmit application to flink cluster(1.4.1,1.4.2) with checkpoint 
> enabled. 
> Job fail whith exception:
> java.lang.NoSuchMethodError: 
> org.apache.flink.streaming.api.operators.HeapInternalTimerService.snapshotTimersForKeyGroup(Lorg/apache/flink/core/memory/DataOutputViewStreamWrapper;I)V
>  
> It seems that 
> `org.apache.flink.streaming.api.operators.HeapInternalTimerService.snapshotTimersForKeyGroup`.
>   was changed in flink1.4.1.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2277) IllegalArgumentException when using Hadoop file system for WordCount example.

2018-08-22 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588751#comment-16588751
 ] 

Tim Robertson commented on BEAM-2277:
-

Thank you again [~JozoVilcek]

> IllegalArgumentException when using Hadoop file system for WordCount example.
> -
>
> Key: BEAM-2277
> URL: https://issues.apache.org/jira/browse/BEAM-2277
> Project: Beam
>  Issue Type: Bug
>  Components: z-do-not-use-sdk-java-extensions
>Affects Versions: 2.6.0
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Blocker
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> IllegalArgumentException when using Hadoop file system for WordCount example.
> Occurred when running WordCount example using Spark runner on a YARN cluster.
> Command-line arguments:
> {code:none}
> --runner=SparkRunner --inputFile=hdfs:///user/myuser/kinglear.txt 
> --output=hdfs:///user/myuser/wc/wc
> {code}
> Stack trace:
> {code:none}
> java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds 
> have the same scheme, but received file, hdfs.
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>   at 
> org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:394)
>   at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:236)
>   at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles(FileBasedSink.java:626)
>   at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBasedSink.java:516)
>   at 
> org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:592)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5195) Add support for `TopPerKey` operator

2018-08-22 Thread Vaclav Plajt (JIRA)
Vaclav Plajt created BEAM-5195:
--

 Summary: Add support for `TopPerKey` operator
 Key: BEAM-5195
 URL: https://issues.apache.org/jira/browse/BEAM-5195
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-euphoria
Reporter: Vaclav Plajt
Assignee: Vaclav Plajt


`TopPerKey` operator is not supported due to its decomposition to 
`ReduceStateByKey` operator which is not supported. That decomposition is wrong 
since it outputs one value per key so no state is needed to perform the 
reduction.

Change decomposition of `TopPerKey` to `ReduceByKey`. That will make 
`TopPerKey` translatable to Beam API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5193) KuduIO testWrite not correctly verifying behaviour

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5193?focusedWorklogId=136922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136922
 ]

ASF GitHub Bot logged work on BEAM-5193:


Author: ASF GitHub Bot
Created on: 22/Aug/18 12:10
Start Date: 22/Aug/18 12:10
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6263: [BEAM-5193] 
Correct typos in KuduIOTest testWrite
URL: https://github.com/apache/beam/pull/6263#issuecomment-415010383
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136922)
Time Spent: 40m  (was: 0.5h)

> KuduIO testWrite not correctly verifying behaviour
> --
>
> Key: BEAM-5193
> URL: https://issues.apache.org/jira/browse/BEAM-5193
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Affects Versions: 2.6.0
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The {{testWrite}} in {{KuduIOTest}} has 2 typos which were committed in error.
> The following code block
> {code:java}
> for (int i = 1; i <= targetParallelism + 1; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n > numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
>  
> Should have read:
> {code:java}
> for (int i = 1; i <= targetParallelism ; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n < numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
> This has gone unnoticed because 1) the {{targetParallelism}} is a function of 
> cores available, and the test uses 3 only (which is the min in 
> {{DirectRunner}}) and 2) the incorrect {{>}} in the test simply meant the 
> loop was not run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?focusedWorklogId=136925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136925
 ]

ASF GitHub Bot logged work on BEAM-3654:


Author: ASF GitHub Bot
Created on: 22/Aug/18 12:15
Start Date: 22/Aug/18 12:15
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #6194: [BEAM-3654] 
Port FilterExamplesTest off DoFnTester
URL: https://github.com/apache/beam/pull/6194#issuecomment-415011624
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136925)
Time Spent: 20m  (was: 10m)

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Mikhail
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PreCommit_Java_Cron #257

2018-08-22 Thread Apache Jenkins Server
See 


--
[...truncated 17.64 MB...]
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:02.991Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/ExpandIterable
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.038Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Pair
 with random key
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.079Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Write
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.124Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.160Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Values/Values/Map
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/ExpandIterable
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.190Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/FinalizeTempFileBundles/Finalize 
into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Values/Values/Map
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.236Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Add void 
key/AddKeys/Map into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/WriteUnshardedBundles
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.286Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Reshuffle/Window.Into()/Window.Assign
 into WriteOneFilePerWindow/TextIO.Write/WriteFiles/GatherTempFileResults/Add 
void key/AddKeys/Map
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.332Z: Fusing consumer 
WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey+WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/Combine.GroupedValues/Partial
 into WordCount.CountWords/Count.PerElement/Init/Map
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.609Z: Fusing consumer 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/GroupUnwritten/Reify
 into 
WriteOneFilePerWindow/TextIO.Write/WriteFiles/WriteUnshardedBundlesToTempFiles/WriteUnshardedBundles
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.655Z: Fusing consumer Window.Into()/Window.Assign 
into ParDo(AddTimestamp)
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.712Z: Fusing consumer 
WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/Combine.GroupedValues
 into 
WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey/Read
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T12:19:03.753Z: Fusing consumer 
WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey/Reify 
into 
WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey+WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/Combine.GroupedValues/Partial
Aug 22, 2018 12:19:10 PM 
org.apache.beam.runner

Jenkins build is back to normal : beam_PostCommit_Java_GradleBuild #1317

2018-08-22 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?focusedWorklogId=136934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136934
 ]

ASF GitHub Bot logged work on BEAM-3654:


Author: ASF GitHub Bot
Created on: 22/Aug/18 12:48
Start Date: 22/Aug/18 12:48
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6194: [BEAM-3654] Port FilterExamplesTest off DoFnTester
URL: https://github.com/apache/beam/pull/6194#discussion_r211930252
 
 

 ##
 File path: 
examples/java/src/test/java/org/apache/beam/examples/cookbook/FilterExamplesTest.java
 ##
 @@ -54,31 +57,35 @@
   .set("year", "2014")
   .set("mean_temp", "45.3")
   .set("tornado", true);
-  static final TableRow[] ROWS_ARRAY = new TableRow[] {row1, row2, row3};
-  static final List ROWS = Arrays.asList(ROWS_ARRAY);
 
   private static final TableRow outRow1 =
   new TableRow().set("year", 2014).set("month", 6).set("day", 
21).set("mean_temp", 85.3);
   private static final TableRow outRow2 =
   new TableRow().set("year", 2014).set("month", 7).set("day", 
20).set("mean_temp", 75.4);
   private static final TableRow outRow3 =
   new TableRow().set("year", 2014).set("month", 6).set("day", 
18).set("mean_temp", 45.3);
-  private static final TableRow[] PROJROWS_ARRAY = new TableRow[] {outRow1, 
outRow2, outRow3};
+
+  @Rule public TestPipeline p = TestPipeline.create();
 
   @Test
-  public void testProjectionFn() throws Exception {
-DoFnTester projectionFn = DoFnTester.of(new 
ProjectionFn());
-List results = projectionFn.processBundle(ROWS_ARRAY);
-Assert.assertThat(results, CoreMatchers.hasItem(outRow1));
-Assert.assertThat(results, CoreMatchers.hasItem(outRow2));
-Assert.assertThat(results, CoreMatchers.hasItem(outRow3));
+  @Category(ValidatesRunner.class)
+  public void testProjectionFn1() {
+PCollection input = p.apply(Create.of(row1, row2, row3));
+
+PCollection results = input.apply(ParDo.of(new ProjectionFn()));
+
+PAssert.that(results).containsInAnyOrder(outRow1, outRow2, outRow3);
+p.run().waitUntilFinish();
   }
 
   @Test
-  public void testFilterSingleMonthDataFn() throws Exception {
-DoFnTester filterSingleMonthDataFn =
-DoFnTester.of(new FilterSingleMonthDataFn(7));
-List results = 
filterSingleMonthDataFn.processBundle(PROJROWS_ARRAY);
-Assert.assertThat(results, CoreMatchers.hasItem(outRow2));
+  @Category(ValidatesRunner.class)
+  public void testFilterSingleMonthDataFn1() {
 
 Review comment:
   Any reason that the method name ends with "1"? I'd prefer to keep old name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136934)
Time Spent: 40m  (was: 0.5h)

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Mikhail
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?focusedWorklogId=136933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136933
 ]

ASF GitHub Bot logged work on BEAM-3654:


Author: ASF GitHub Bot
Created on: 22/Aug/18 12:48
Start Date: 22/Aug/18 12:48
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#6194: [BEAM-3654] Port FilterExamplesTest off DoFnTester
URL: https://github.com/apache/beam/pull/6194#discussion_r211928636
 
 

 ##
 File path: 
examples/java/src/test/java/org/apache/beam/examples/cookbook/FilterExamplesTest.java
 ##
 @@ -54,31 +57,35 @@
   .set("year", "2014")
   .set("mean_temp", "45.3")
   .set("tornado", true);
-  static final TableRow[] ROWS_ARRAY = new TableRow[] {row1, row2, row3};
-  static final List ROWS = Arrays.asList(ROWS_ARRAY);
 
   private static final TableRow outRow1 =
   new TableRow().set("year", 2014).set("month", 6).set("day", 
21).set("mean_temp", 85.3);
   private static final TableRow outRow2 =
   new TableRow().set("year", 2014).set("month", 7).set("day", 
20).set("mean_temp", 75.4);
   private static final TableRow outRow3 =
   new TableRow().set("year", 2014).set("month", 6).set("day", 
18).set("mean_temp", 45.3);
-  private static final TableRow[] PROJROWS_ARRAY = new TableRow[] {outRow1, 
outRow2, outRow3};
+
+  @Rule public TestPipeline p = TestPipeline.create();
 
   @Test
-  public void testProjectionFn() throws Exception {
-DoFnTester projectionFn = DoFnTester.of(new 
ProjectionFn());
-List results = projectionFn.processBundle(ROWS_ARRAY);
-Assert.assertThat(results, CoreMatchers.hasItem(outRow1));
-Assert.assertThat(results, CoreMatchers.hasItem(outRow2));
-Assert.assertThat(results, CoreMatchers.hasItem(outRow3));
+  @Category(ValidatesRunner.class)
+  public void testProjectionFn1() {
 
 Review comment:
   Any reason that the method name ends with "1"? I'd prefer to keep old name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136933)
Time Spent: 0.5h  (was: 20m)

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Mikhail
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5193) KuduIO testWrite not correctly verifying behaviour

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5193?focusedWorklogId=136936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136936
 ]

ASF GitHub Bot logged work on BEAM-5193:


Author: ASF GitHub Bot
Created on: 22/Aug/18 12:53
Start Date: 22/Aug/18 12:53
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6263: [BEAM-5193] 
Correct typos in KuduIOTest testWrite
URL: https://github.com/apache/beam/pull/6263#issuecomment-415021209
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136936)
Time Spent: 50m  (was: 40m)

> KuduIO testWrite not correctly verifying behaviour
> --
>
> Key: BEAM-5193
> URL: https://issues.apache.org/jira/browse/BEAM-5193
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Affects Versions: 2.6.0
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The {{testWrite}} in {{KuduIOTest}} has 2 typos which were committed in error.
> The following code block
> {code:java}
> for (int i = 1; i <= targetParallelism + 1; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n > numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
>  
> Should have read:
> {code:java}
> for (int i = 1; i <= targetParallelism ; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n < numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
> This has gone unnoticed because 1) the {{targetParallelism}} is a function of 
> cores available, and the test uses 3 only (which is the min in 
> {{DirectRunner}}) and 2) the incorrect {{>}} in the test simply meant the 
> loop was not run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #6263: [BEAM-5193] Correct typos in KuduIOTest testWrite

2018-08-22 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 266942cedbaa8dad200096046d32f4885f5d2523
Merge: 7c41e0a 25ef933
Author: Ismaël Mejía 
AuthorDate: Wed Aug 22 15:36:47 2018 +0200

Merge pull request #6263: [BEAM-5193] Correct typos in KuduIOTest testWrite

 .../src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)



[beam] branch master updated (7c41e0a -> 266942c)

2018-08-22 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 7c41e0a  [BEAM-5110] Explicitly count the references for 
BatchFlinkExecutableStageContext … (#6189)
 add 25ef933  [BEAM-5193] Correct typos in KuduIOTest testWrite
 new 266942c  Merge pull request #6263: [BEAM-5193] Correct typos in 
KuduIOTest testWrite

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)



[jira] [Work logged] (BEAM-5193) KuduIO testWrite not correctly verifying behaviour

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5193?focusedWorklogId=136945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136945
 ]

ASF GitHub Bot logged work on BEAM-5193:


Author: ASF GitHub Bot
Created on: 22/Aug/18 13:36
Start Date: 22/Aug/18 13:36
Worklog Time Spent: 10m 
  Work Description: iemejia closed pull request #6263: [BEAM-5193] Correct 
typos in KuduIOTest testWrite
URL: https://github.com/apache/beam/pull/6263
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java 
b/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java
index a39b87b4e22..55a2b72af15 100644
--- 
a/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java
+++ 
b/sdks/java/io/kudu/src/test/java/org/apache/beam/sdk/io/kudu/KuduIOTest.java
@@ -125,9 +125,9 @@ public void testRead() throws KuduException {
   }
 
   /**
-   * Test the write path using a {@link FakeWriter} and verifying the expected 
log statements are
+   * Test the write path using a {@link FakeWriter} and verifies the expected 
log statements are
* written. This test ensures that the {@link KuduIO} correctly respects 
parallelism by
-   * deserializes writers and that each writer is opening and closing Kudu 
sessions.
+   * deserializing writers and that each writer is opening and closing Kudu 
sessions.
*/
   @Test
   public void testWrite() throws Exception {
@@ -144,14 +144,14 @@ public void testWrite() throws Exception {
 .withKuduService(mockWriteService));
 writePipeline.run().waitUntilFinish();
 
-for (int i = 1; i <= targetParallelism + 1; i++) {
+for (int i = 1; i <= targetParallelism; i++) {
   expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, 
i));
   expectedWriteLogs.verifyDebug(
   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
   
expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
 }
 // verify all entries written
-for (int n = 0; n > numberRecords; n++) {
+for (int n = 0; n < numberRecords; n++) {
   expectedWriteLogs.verifyDebug(
   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
writer
 }


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136945)
Time Spent: 1h  (was: 50m)

> KuduIO testWrite not correctly verifying behaviour
> --
>
> Key: BEAM-5193
> URL: https://issues.apache.org/jira/browse/BEAM-5193
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Affects Versions: 2.6.0
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The {{testWrite}} in {{KuduIOTest}} has 2 typos which were committed in error.
> The following code block
> {code:java}
> for (int i = 1; i <= targetParallelism + 1; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n > numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
>  
> Should have read:
> {code:java}
> for (int i = 1; i <= targetParallelism ; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n < numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
> This has gone unnoticed because 1) the {{targetParallelism}} is a

[jira] [Resolved] (BEAM-5193) KuduIO testWrite not correctly verifying behaviour

2018-08-22 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-5193.

Resolution: Fixed

> KuduIO testWrite not correctly verifying behaviour
> --
>
> Key: BEAM-5193
> URL: https://issues.apache.org/jira/browse/BEAM-5193
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Affects Versions: 2.6.0
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The {{testWrite}} in {{KuduIOTest}} has 2 typos which were committed in error.
> The following code block
> {code:java}
> for (int i = 1; i <= targetParallelism + 1; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n > numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
>  
> Should have read:
> {code:java}
> for (int i = 1; i <= targetParallelism ; i++) {
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_OPEN_SESSION, i));
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE, i)); // at least one per writer
>   
> expectedWriteLogs.verifyDebug(String.format(FakeWriter.LOG_CLOSE_SESSION, i));
> }
> // verify all entries written
> for (int n = 0; n < numberRecords; n++) {
>   expectedWriteLogs.verifyDebug(
>   String.format(FakeWriter.LOG_WRITE_VALUE, n)); // at least one per 
> writer
> }
> {code}
> This has gone unnoticed because 1) the {{targetParallelism}} is a function of 
> cores available, and the test uses 3 only (which is the min in 
> {{DirectRunner}}) and 2) the incorrect {{>}} in the test simply meant the 
> loop was not run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5190) Python pipeline options are not picked correctly by PortableRunner

2018-08-22 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588862#comment-16588862
 ] 

Thomas Weise commented on BEAM-5190:


Yep, options need to be rewritten during submission because the runner needs to 
be able to read them also.

BTW I have also seen the deadlock symptom while working on side inputs, when 
attempting to run wordcount (in streaming mode). 

> Python pipeline options are not picked correctly by PortableRunner
> --
>
> Key: BEAM-5190
> URL: https://issues.apache.org/jira/browse/BEAM-5190
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> Python SDK worker is deserializing the pipeline options to dictionary instead 
> of PipelineOptions
> Sample log
> [grpc-default-executor-2] INFO sdk_worker_main.main - Python sdk harness 
> started with pipeline_options: \{u'beam:option:flink_master:v1': u'[auto]', 
> u'beam:option:streaming:v1': False, u'beam:option:experiments:v1': 
> [u'beam_fn_api', u'worker_threads=50'], u'beam:option:dry_run:v1': False, 
> u'beam:option:runner:v1': None, u'beam:option:profile_memory:v1': False, 
> u'beam:option:runtime_type_check:v1': False, u'beam:option:region:v1': 
> u'us-central1', u'beam:option:options_id:v1': 1, u'beam:option:no_auth:v1': 
> False, u'beam:option:dataflow_endpoint:v1': 
> u'https://dataflow.googleapis.com', u'beam:option:sdk_location:v1': 
> u'/usr/local/google/home/goenka/d/work/beam/beam/sdks/python/dist/apache-beam-2.7.0.dev0.tar.gz',
>  u'beam:option:direct_runner_use_stacked_bundle:v1': True, 
> u'beam:option:save_main_session:v1': True, 
> u'beam:option:type_check_strictness:v1': u'DEFAULT_TO_ANY', 
> u'beam:option:profile_cpu:v1': False, u'beam:option:job_endpoint:v1': 
> u'localhost:8099', u'beam:option:job_name:v1': 
> u'BeamApp-goenka-0822071645-48ae1008', u'beam:option:temp_location:v1': 
> u'gs://clouddfe-goenka/tmp/', u'beam:option:app_name:v1': None, 
> u'beam:option:project:v1': u'google.com:clouddfe', 
> u'beam:option:pipeline_type_check:v1': True, 
> u'beam:option:staging_location:v1': u'gs://clouddfe-goenka/tmp/staging'} 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5196) Add MD5 consistency check on S3 uploads (writes)

2018-08-22 Thread JIRA
Ismaël Mejía created BEAM-5196:
--

 Summary: Add MD5 consistency check on S3 uploads (writes)
 Key: BEAM-5196
 URL: https://issues.apache.org/jira/browse/BEAM-5196
 Project: Beam
  Issue Type: Improvement
  Components: io-java-aws
Affects Versions: 2.7.0
Reporter: Ismaël Mejía
Assignee: Leen Toelen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588659#comment-16588659
 ] 

Tim Robertson edited comment on BEAM-3654 at 8/22/18 1:51 PM:
--

My apologies for the Kudu error.

-I have lodged and will fix https://issues.apache.org/jira/browse/BEAM-5193 now 
which was simply a mistake.-

Now fixed on master

 

 


was (Author: timrobertson100):
My apologies for the Kudu error.

I have lodged and will fix https://issues.apache.org/jira/browse/BEAM-5193 now 
which was simply a mistake.

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Mikhail
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?focusedWorklogId=136950&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136950
 ]

ASF GitHub Bot logged work on BEAM-3654:


Author: ASF GitHub Bot
Created on: 22/Aug/18 13:52
Start Date: 22/Aug/18 13:52
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6194: [BEAM-3654] 
Port FilterExamplesTest off DoFnTester
URL: https://github.com/apache/beam/pull/6194#issuecomment-415039138
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136950)
Time Spent: 50m  (was: 40m)

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Mikhail
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated: [BEAM-5196] Add MD5 consistency check on S3 uploads (writes)

2018-08-22 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 670cb19  [BEAM-5196] Add MD5 consistency check on S3 uploads (writes)
 new 90ca21e  Merge pull request #6232: [BEAM-5196] Add MD5 consistency 
check on S3 uploads (writes)
670cb19 is described below

commit 670cb197ba74a79ae73085a3c4db5040c8e55904
Author: Leen Toelen 
AuthorDate: Wed Aug 15 22:03:51 2018 +0200

[BEAM-5196] Add MD5 consistency check on S3 uploads (writes)
---
 sdks/java/io/amazon-web-services/build.gradle  |  1 +
 .../beam/sdk/io/aws/s3/S3WritableByteChannel.java  | 15 +
 .../beam/sdk/io/aws/s3/S3FileSystemTest.java   | 65 ++
 .../org/apache/beam/sdk/io/aws/s3/S3TestUtils.java |  6 +-
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/sdks/java/io/amazon-web-services/build.gradle 
b/sdks/java/io/amazon-web-services/build.gradle
index cbcc97d..973d115 100644
--- a/sdks/java/io/amazon-web-services/build.gradle
+++ b/sdks/java/io/amazon-web-services/build.gradle
@@ -42,4 +42,5 @@ dependencies {
   shadowTest library.java.mockito_core
   shadowTest library.java.junit
   shadowTest library.java.slf4j_jdk14
+  testCompile "io.findify:s3mock_2.11:0.2.4"
 }
diff --git 
a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java
 
b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java
index 6c1fc71..c061adc 100644
--- 
a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java
+++ 
b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java
@@ -29,12 +29,15 @@ import com.amazonaws.services.s3.model.ObjectMetadata;
 import com.amazonaws.services.s3.model.PartETag;
 import com.amazonaws.services.s3.model.UploadPartRequest;
 import com.amazonaws.services.s3.model.UploadPartResult;
+import com.amazonaws.util.Base64;
 import com.google.common.annotations.VisibleForTesting;
 import java.io.ByteArrayInputStream;
 import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.nio.channels.ClosedChannelException;
 import java.nio.channels.WritableByteChannel;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
 import java.util.ArrayList;
 import java.util.List;
 import org.apache.beam.sdk.io.aws.options.S3Options;
@@ -53,6 +56,7 @@ class S3WritableByteChannel implements WritableByteChannel {
   // AWS S3 parts are 1-indexed, not zero-indexed.
   private int partNumber = 1;
   private boolean open = true;
+  private final MessageDigest md5 = md5();
 
   S3WritableByteChannel(AmazonS3 amazonS3, S3ResourceId path, String 
contentType, S3Options options)
   throws IOException {
@@ -95,6 +99,14 @@ class S3WritableByteChannel implements WritableByteChannel {
 uploadId = result.getUploadId();
   }
 
+  private static MessageDigest md5() {
+try {
+  return MessageDigest.getInstance("MD5");
+} catch (NoSuchAlgorithmException e) {
+  throw new IllegalStateException(e);
+}
+  }
+
   @Override
   public int write(ByteBuffer sourceBuffer) throws IOException {
 if (!isOpen()) {
@@ -109,6 +121,7 @@ class S3WritableByteChannel implements WritableByteChannel {
   byte[] copyBuffer = new byte[bytesWritten];
   sourceBuffer.get(copyBuffer);
   uploadBuffer.put(copyBuffer);
+  md5.update(copyBuffer);
 
   if (!uploadBuffer.hasRemaining() || sourceBuffer.hasRemaining()) {
 flush();
@@ -129,6 +142,7 @@ class S3WritableByteChannel implements WritableByteChannel {
 .withUploadId(uploadId)
 .withPartNumber(partNumber++)
 .withPartSize(uploadBuffer.remaining())
+.withMD5Digest(Base64.encodeAsString(md5.digest()))
 .withInputStream(inputStream);
 request.setSSECustomerKey(options.getSSECustomerKey());
 
@@ -139,6 +153,7 @@ class S3WritableByteChannel implements WritableByteChannel {
   throw new IOException(e);
 }
 uploadBuffer.clear();
+md5.reset();
 eTags.add(result.getPartETag());
   }
 
diff --git 
a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java
 
b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java
index 63f69b5..0abf217 100644
--- 
a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java
+++ 
b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java
@@ -22,9 +22,11 @@ import static 
org.apache.beam.sdk.io.aws.s3.S3TestUtils.buildMockedS3FileSystem;
 import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.getSSECustomerKeyMd5;
 import static org.apache.beam.sdk.io.aws.s3.S3TestUti

[jira] [Work logged] (BEAM-5196) Add MD5 consistency check on S3 uploads (writes)

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5196?focusedWorklogId=136952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136952
 ]

ASF GitHub Bot logged work on BEAM-5196:


Author: ASF GitHub Bot
Created on: 22/Aug/18 13:57
Start Date: 22/Aug/18 13:57
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #6232: [BEAM-5196]  Add MD5 
consistency check on S3 uploads (writes)
URL: https://github.com/apache/beam/pull/6232#issuecomment-415040971
 
 
   I merged manually to do some minor final touches, so it will look closed 
(not merged) in the github UI but it is ok.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136952)
Time Spent: 10m
Remaining Estimate: 0h

> Add MD5 consistency check on S3 uploads (writes)
> 
>
> Key: BEAM-5196
> URL: https://issues.apache.org/jira/browse/BEAM-5196
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Affects Versions: 2.7.0
>Reporter: Ismaël Mejía
>Assignee: Leen Toelen
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5196) Add MD5 consistency check on S3 uploads (writes)

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5196?focusedWorklogId=136953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136953
 ]

ASF GitHub Bot logged work on BEAM-5196:


Author: ASF GitHub Bot
Created on: 22/Aug/18 13:57
Start Date: 22/Aug/18 13:57
Worklog Time Spent: 10m 
  Work Description: iemejia closed pull request #6232: [BEAM-5196]  Add MD5 
consistency check on S3 uploads (writes)
URL: https://github.com/apache/beam/pull/6232
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/java/io/amazon-web-services/build.gradle 
b/sdks/java/io/amazon-web-services/build.gradle
index cbcc97df0ce..973d115dbf2 100644
--- a/sdks/java/io/amazon-web-services/build.gradle
+++ b/sdks/java/io/amazon-web-services/build.gradle
@@ -42,4 +42,5 @@ dependencies {
   shadowTest library.java.mockito_core
   shadowTest library.java.junit
   shadowTest library.java.slf4j_jdk14
+  testCompile "io.findify:s3mock_2.11:0.2.4"
 }
diff --git 
a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java
 
b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java
index 6c1fc7188f8..c061adc527b 100644
--- 
a/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java
+++ 
b/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3WritableByteChannel.java
@@ -29,12 +29,15 @@
 import com.amazonaws.services.s3.model.PartETag;
 import com.amazonaws.services.s3.model.UploadPartRequest;
 import com.amazonaws.services.s3.model.UploadPartResult;
+import com.amazonaws.util.Base64;
 import com.google.common.annotations.VisibleForTesting;
 import java.io.ByteArrayInputStream;
 import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.nio.channels.ClosedChannelException;
 import java.nio.channels.WritableByteChannel;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
 import java.util.ArrayList;
 import java.util.List;
 import org.apache.beam.sdk.io.aws.options.S3Options;
@@ -53,6 +56,7 @@
   // AWS S3 parts are 1-indexed, not zero-indexed.
   private int partNumber = 1;
   private boolean open = true;
+  private final MessageDigest md5 = md5();
 
   S3WritableByteChannel(AmazonS3 amazonS3, S3ResourceId path, String 
contentType, S3Options options)
   throws IOException {
@@ -95,6 +99,14 @@
 uploadId = result.getUploadId();
   }
 
+  private static MessageDigest md5() {
+try {
+  return MessageDigest.getInstance("MD5");
+} catch (NoSuchAlgorithmException e) {
+  throw new IllegalStateException(e);
+}
+  }
+
   @Override
   public int write(ByteBuffer sourceBuffer) throws IOException {
 if (!isOpen()) {
@@ -109,6 +121,7 @@ public int write(ByteBuffer sourceBuffer) throws 
IOException {
   byte[] copyBuffer = new byte[bytesWritten];
   sourceBuffer.get(copyBuffer);
   uploadBuffer.put(copyBuffer);
+  md5.update(copyBuffer);
 
   if (!uploadBuffer.hasRemaining() || sourceBuffer.hasRemaining()) {
 flush();
@@ -129,6 +142,7 @@ private void flush() throws IOException {
 .withUploadId(uploadId)
 .withPartNumber(partNumber++)
 .withPartSize(uploadBuffer.remaining())
+.withMD5Digest(Base64.encodeAsString(md5.digest()))
 .withInputStream(inputStream);
 request.setSSECustomerKey(options.getSSECustomerKey());
 
@@ -139,6 +153,7 @@ private void flush() throws IOException {
   throw new IOException(e);
 }
 uploadBuffer.clear();
+md5.reset();
 eTags.add(result.getPartETag());
   }
 
diff --git 
a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java
 
b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java
index 63f69b55c0d..0abf2170a36 100644
--- 
a/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java
+++ 
b/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java
@@ -22,9 +22,11 @@
 import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.getSSECustomerKeyMd5;
 import static org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3Options;
 import static 
org.apache.beam.sdk.io.aws.s3.S3TestUtils.s3OptionsWithSSECustomerKey;
+import static 
org.apache.beam.sdk.io.fs.CreateOptions.StandardCreateOptions.builder;
 import static org.hamcrest.Matchers.contains;
 import static org.hamcrest.Matchers.notNullValue;
 import static org.hamcrest.Matchers.nullValue;
+import static org.junit.Assert.assert

[jira] [Resolved] (BEAM-5196) Add MD5 consistency check on S3 uploads (writes)

2018-08-22 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-5196.

   Resolution: Fixed
Fix Version/s: 2.7.0

> Add MD5 consistency check on S3 uploads (writes)
> 
>
> Key: BEAM-5196
> URL: https://issues.apache.org/jira/browse/BEAM-5196
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Leen Toelen
>Priority: Minor
> Fix For: 2.7.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5196) Add MD5 consistency check on S3 uploads (writes)

2018-08-22 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-5196:
---
Affects Version/s: (was: 2.7.0)

> Add MD5 consistency check on S3 uploads (writes)
> 
>
> Key: BEAM-5196
> URL: https://issues.apache.org/jira/browse/BEAM-5196
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Leen Toelen
>Priority: Minor
> Fix For: 2.7.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #1318

2018-08-22 Thread Apache Jenkins Server
See 


Changes:

[timrobertson100] [BEAM-5193] Correct typos in KuduIOTest testWrite

--
[...truncated 20.47 MB...]
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.464Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.509Z: Elided trivial flatten 
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.552Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.597Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.632Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.678Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.726Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.774Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.826Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.874Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.926Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
Aug 22, 2018 2:19:25 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T14:19:19.965Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton

[jira] [Work logged] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?focusedWorklogId=136967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136967
 ]

ASF GitHub Bot logged work on BEAM-3654:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:42
Start Date: 22/Aug/18 14:42
Worklog Time Spent: 10m 
  Work Description: SokolovMS commented on issue #6194: [BEAM-3654] Port 
FilterExamplesTest off DoFnTester
URL: https://github.com/apache/beam/pull/6194#issuecomment-415056693
 
 
   Applied remarks, added issue ID to squashed commit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136967)
Time Spent: 1h  (was: 50m)

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Mikhail
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5062) Add ability to configure S3ClientOptions

2018-08-22 Thread Kirill Kozlov (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-5062:

Affects Version/s: (was: 2.5.0)
  Description: 
It would be very useful to have an ability to configure 
[S3ClientOptions|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/S3ClientOptions.html]
 for Apache Beam jobs.

For example, there are some implementations of S3, that does not support 
virtual-hosted-style URLs for buckets, only path-style. Currently it's 
impossible to enable path style access for amazon s3 client, which is used by 
an apache-beam job.

  was:
There are some implementations of S3, that does not support 
virtual-hosted-style URLs for buckets, only path-style.

It would be useful to add one more option into S3Options to enable path-style 
access for buckets.

[withPathStyleAccessEnabled|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#withPathStyleAccessEnabled-java.lang.Boolean-]
 method could be used for it when building client.

  Summary: Add ability to configure S3ClientOptions  (was: Add 
S3Options to enable path style access)

After a discussion in github PR, we decided to generalize this improvement to 
have an ability to configure all Amazon S3 client options.

> Add ability to configure S3ClientOptions
> 
>
> Key: BEAM-5062
> URL: https://issues.apache.org/jira/browse/BEAM-5062
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> It would be very useful to have an ability to configure 
> [S3ClientOptions|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/S3ClientOptions.html]
>  for Apache Beam jobs.
> For example, there are some implementations of S3, that does not support 
> virtual-hosted-style URLs for buckets, only path-style. Currently it's 
> impossible to enable path style access for amazon s3 client, which is used by 
> an apache-beam job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136972
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:47
Start Date: 22/Aug/18 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r211969581
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java
 ##
 @@ -374,6 +433,8 @@ public String apply(JsonNode input) {
* Tests that documents are dynamically routed to different types and not 
the type that is given
* in the configuration. Documents should be routed to the a type of type_0 
or type_1 using a
* modulo approach of the explicit id.
+   *
+   * This test does not work with ES 6 because multiple type support within 
an index was removed.
*/
   void testWriteWithTypeFn() throws Exception {
 
 Review comment:
   rename testWriteWithTypeFn2x5x


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136972)
Time Spent: 7h  (was: 6h 40m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136968
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:47
Start Date: 22/Aug/18 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r211981206
 
 

 ##
 File path: sdks/java/io/elasticsearch/build.gradle
 ##
 @@ -27,7 +27,7 @@ dependencies {
   shadow project(path: ":beam-sdks-java-core", configuration: "shadow")
   shadow library.java.jackson_databind
   shadow library.java.jackson_annotations
-  shadow "org.elasticsearch.client:elasticsearch-rest-client:5.6.3"
+  shadow "org.elasticsearch.client:elasticsearch-rest-client:6.3.2"
 
 Review comment:
   Nice to see that REST client v 6.3.2 is still retro-compatible with ES2 and 
ES5. I chose this low level client over the other high level client in the 
first place to have the common set of production libraries between all the 
versions. Only test dependencies differ


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136968)
Time Spent: 6.5h  (was: 6h 20m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136969
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:47
Start Date: 22/Aug/18 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r211890438
 
 

 ##
 File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
 ##
 @@ -125,7 +125,8 @@
  *
  * Optionally, you can provide {@link 
ElasticsearchIO.Write.FieldValueExtractFn} using {@code
  * withIndexFn()} or {@code withTypeFn()} to enable per-document routing to 
the target Elasticsearch
- * index and type.
+ * index (all versions) and type (version <6). Support for type routing was 
removed in Elasticsearch
+ * 6 (see 
https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch)
 
 Review comment:
   Good catch ! That being said, there is not only type routing that will be 
removed but all the type feature in ES ! It will be removed in 7.0. As we know 
the IO will break in 7.0 I prefer that you put `version == 5  || version == 6` 
everywhere you put `version >= 5` to avoid that the users use the IO on 
incompatible ES v7.  We will then do another PR to support ES v7 on the IO. I 
just opened this ticket to track the future migration: 
https://issues.apache.org/jira/browse/BEAM-5192. @timrobertson100 I assigned it 
to you regarding our conversation of yesterday.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136969)
Time Spent: 6h 40m  (was: 6.5h)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136974
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:47
Start Date: 22/Aug/18 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r211965296
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.elasticsearch;
+
+import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration;
+
+import 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOITCommon.ElasticsearchPipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.elasticsearch.client.RestClient;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+
+/**
+ * A test of {@link ElasticsearchIO} on an independent Elasticsearch v6.x 
instance.
+ *
+ * This test requires a running instance of Elasticsearch, and the test 
dataset must exist in the
+ * database. See {@link ElasticsearchIOITCommon} for instructions to achieve 
this.
+ *
+ * You can run this test by doing the following from the beam parent module 
directory with the
+ * correct server IP:
+ *
+ * 
+ *  ./gradlew integrationTest -p 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-6
+ *  -DintegrationTestPipelineOptions='[
+ *  "--elasticsearchServer=1.2.3.4",
+ *  "--elasticsearchHttpPort=9200"]'
+ *  --tests org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOIT
+ *  -DintegrationTestRunner=direct
+ * 
+ *
+ * It is likely that you will need to configure 
thread_pool.bulk.queue_size: 250 (or
+ * higher) in the backend Elasticsearch server for this test to run.
+ */
+public class ElasticsearchIOIT {
+  private static RestClient restClient;
+  private static ElasticsearchPipelineOptions options;
+  private static ConnectionConfiguration readConnectionConfiguration;
+  private static ConnectionConfiguration writeConnectionConfiguration;
+  private static ConnectionConfiguration updateConnectionConfiguration;
+  private static ElasticsearchIOTestCommon elasticsearchIOTestCommon;
+
+  @Rule public TestPipeline pipeline = TestPipeline.create();
+
+  @BeforeClass
+  public static void beforeClass() throws Exception {
+PipelineOptionsFactory.register(ElasticsearchPipelineOptions.class);
+options = 
TestPipeline.testingPipelineOptions().as(ElasticsearchPipelineOptions.class);
+readConnectionConfiguration =
+ElasticsearchIOITCommon.getConnectionConfiguration(
+options, ElasticsearchIOITCommon.IndexMode.READ);
+writeConnectionConfiguration =
+ElasticsearchIOITCommon.getConnectionConfiguration(
+options, ElasticsearchIOITCommon.IndexMode.WRITE);
+updateConnectionConfiguration =
+ElasticsearchIOITCommon.getConnectionConfiguration(
+options, ElasticsearchIOITCommon.IndexMode.WRITE_PARTIAL);
+restClient = readConnectionConfiguration.createClient();
+elasticsearchIOTestCommon =
+new ElasticsearchIOTestCommon(readConnectionConfiguration, restClient, 
true);
+  }
+
+  @AfterClass
+  public static void afterClass() throws Exception {
+ElasticSearchIOTestUtils.deleteIndex(writeConnectionConfiguration, 
restClient);
+ElasticSearchIOTestUtils.deleteIndex(updateConnectionConfiguration, 
restClient);
+restClient.close();
+  }
+
+  @Test
+  public void testSplitsVolume() throws Exception {
+elasticsearchIOTestCommon.testITSplit5x6x(options);
+  }
+
+  @Test
+  public void testReadVolume() throws Exception {
+elasticsearchIOTestCommon.setPipeline(pipeline);
+elasticsearchIOTestCommon.testRead();
+  }
+
+  @Test
+  pub

[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136971
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:47
Start Date: 22/Aug/18 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r211980328
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java
 ##
 @@ -133,8 +133,7 @@ void testReadWithQuery() throws Exception {
 + "  \"query\": {\n"
 + "  \"match\" : {\n"
 + "\"scientist\" : {\n"
-+ "  \"query\" : \"Einstein\",\n"
-+ "  \"type\" : \"boolean\"\n"
++ "  \"query\" : \"Einstein\"\n"
 
 Review comment:
   Nice! I see through the gree UTests that the test still work in ES5 and 
works in ES6. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136971)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136975&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136975
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:47
Start Date: 22/Aug/18 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r211891058
 
 

 ##
 File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
 ##
 @@ -125,7 +125,8 @@
  *
  * Optionally, you can provide {@link 
ElasticsearchIO.Write.FieldValueExtractFn} using {@code
  * withIndexFn()} or {@code withTypeFn()} to enable per-document routing to 
the target Elasticsearch
- * index and type.
+ * index (all versions) and type (version <6). Support for type routing was 
removed in Elasticsearch
+ * 6 (see 
https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch)
 
 Review comment:
   @dattran-vn01 In addition I don't see the type routing breaking change in 
v6.x breaking changes list. Are you sure of that? I only see that in 6.0 new 
created indexes can only contain one type. Please test type routing support and 
if it is still supported in 6.x please remove the Jdoc comments and leave the 
feature enabled


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136975)
Time Spent: 7h 20m  (was: 7h 10m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136970
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:47
Start Date: 22/Aug/18 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r211968823
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTestCommon.java
 ##
 @@ -106,6 +110,62 @@ void testSizes() throws Exception {
 assertThat("Wrong estimated size", estimatedSize, 
greaterThan(AVERAGE_DOC_SIZE * numDocs));
   }
 
+  void testSplit5x6x() throws Exception {
 
 Review comment:
   You need to re-architecture the tests: 
   1. ok to introduce testSplit5x6x in TestCommon because ES5 and ES6 split the 
same and not ES2
   2. ok to introduce an IT version of this test for the same reason
   3. Change ESTestCommon.testSplit5x6x : it should be exactly the same as 
previous ES5Test.testSplit modulo the final keywords you were right to add. And 
UTests in ES5 and ES6 should just call 
elasticsearchIOTestCommon.testSplit5x6x(); not create the index.
   4. Change ESTestCommon.testITSplit5x6x: it should be exactly the same as 
previous ES5Test.testSplitVolume module the final keywords you were right to 
add.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136970)
Time Spent: 6h 50m  (was: 6h 40m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136973
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:47
Start Date: 22/Aug/18 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r211939714
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-6/src/test/java/org/elasticsearch/bootstrap/JarHell.java
 ##
 @@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.elasticsearch.bootstrap;
+
+import java.util.function.Consumer;
+
+/**
+ * We need a real Elasticsearch instance to properly test the IO (split, slice 
API, scroll API,
+ * ...). Starting at ES 5, to have Elasticsearch embedded, we are forced to 
use Elasticsearch test
+ * framework. But this framework checks for class duplicates in classpath and 
it cannot be
+ * deactivated. When the class duplication come from a dependency, then it 
cannot be avoided.
+ * Elasticsearch community does not provide a way of deactivating the jar hell 
test, so skip it by
+ * making this hack. In this case duplicate class is class:
+ * org.apache.maven.surefire.report.SafeThrowable jar1: surefire-api-2.20.jar 
jar2:
+ * surefire-junit47-2.20.jar
+ */
+class JarHell {
+
+  @SuppressWarnings("EmptyMethod")
+  public static void checkJarHell(Consumer output) {
+System.out.println(
 
 Review comment:
   yes but printing it is unnecessary. please remove the System.out.println


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136973)
Time Spent: 7h 10m  (was: 7h)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=136977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136977
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 22/Aug/18 14:52
Start Date: 22/Aug/18 14:52
Worklog Time Spent: 10m 
  Work Description: dattran-vn01 commented on a change in pull request 
#6211: [BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r211985701
 
 

 ##
 File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
 ##
 @@ -125,7 +125,8 @@
  *
  * Optionally, you can provide {@link 
ElasticsearchIO.Write.FieldValueExtractFn} using {@code
  * withIndexFn()} or {@code withTypeFn()} to enable per-document routing to 
the target Elasticsearch
- * index and type.
+ * index (all versions) and type (version <6). Support for type routing was 
removed in Elasticsearch
+ * 6 (see 
https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch)
 
 Review comment:
   Let me test it again and confirm. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136977)
Time Spent: 7.5h  (was: 7h 20m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #1319

2018-08-22 Thread Apache Jenkins Server
See 


Changes:

[iemejia] [BEAM-5196] Add MD5 consistency check on S3 uploads (writes)

--
[...truncated 18.36 MB...]
Aug 22, 2018 3:01:10 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Staging pipeline description to 
gs://temp-storage-for-end-to-end-tests/spannerwriteit0testreportfailures-jenkins-0822150058-616afaf/output/results/staging/
Aug 22, 2018 3:01:10 PM org.apache.beam.runners.dataflow.util.PackageUtil 
tryStagePackage
INFO: Uploading <115993 bytes, hash 39MxzH8P14zFeUV-f2L7Dg> to 
gs://temp-storage-for-end-to-end-tests/spannerwriteit0testreportfailures-jenkins-0822150058-616afaf/output/results/staging/pipeline-39MxzH8P14zFeUV-f2L7Dg.pb

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_OUT
Dataflow SDK version: 2.7.0-SNAPSHOT

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_ERROR
Aug 22, 2018 3:01:12 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-08-22_08_01_11-15458516256039183116?project=apache-beam-testing

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_OUT
Submitted job: 2018-08-22_08_01_11-15458516256039183116

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_ERROR
Aug 22, 2018 3:01:12 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To cancel the job using the 'gcloud' tool, run:
> gcloud dataflow jobs --project=apache-beam-testing cancel 
--region=us-central1 2018-08-22_08_01_11-15458516256039183116
Aug 22, 2018 3:01:12 PM org.apache.beam.runners.dataflow.TestDataflowRunner 
run
INFO: Running Dataflow job 2018-08-22_08_01_11-15458516256039183116 with 0 
expected assertions.
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:11.259Z: Autoscaling is enabled for job 
2018-08-22_08_01_11-15458516256039183116. The number of workers will be between 
1 and 1000.
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:11.310Z: Autoscaling was automatically enabled for 
job 2018-08-22_08_01_11-15458516256039183116.
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:14.020Z: Checking required Cloud APIs are enabled.
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:14.133Z: Checking permissions granted to controller 
Service Account.
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:17.764Z: Worker configuration: n1-standard-1 in 
us-central1-f.
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:18.279Z: Expanding CoGroupByKey operations into 
optimizable parts.
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:18.554Z: Expanding GroupByKey operations into 
optimizable parts.
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:18.604Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:18.885Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:18.934Z: Elided trivial flatten 
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:18.981Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:19.032Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
Aug 22, 2018 3:01:23 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-08-22T15:01:19.080Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindow

[jira] [Work logged] (BEAM-5196) Add MD5 consistency check on S3 uploads (writes)

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5196?focusedWorklogId=136982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136982
 ]

ASF GitHub Bot logged work on BEAM-5196:


Author: ASF GitHub Bot
Created on: 22/Aug/18 15:07
Start Date: 22/Aug/18 15:07
Worklog Time Spent: 10m 
  Work Description: toelen commented on issue #6232: [BEAM-5196]  Add MD5 
consistency check on S3 uploads (writes)
URL: https://github.com/apache/beam/pull/6232#issuecomment-415066024
 
 
   Thanks @iemejia . Just out of curiosity, are there any tests that are being 
run before releasing that use actual cloud environments? Just thinking about 
BigQuery, S3, ... Is is quite easy to break something right now, and running 
integration tests on each supported cloud provider/service could have value


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136982)
Time Spent: 0.5h  (was: 20m)

> Add MD5 consistency check on S3 uploads (writes)
> 
>
> Key: BEAM-5196
> URL: https://issues.apache.org/jira/browse/BEAM-5196
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Leen Toelen
>Priority: Minor
> Fix For: 2.7.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3654) Port FilterExamplesTest off DoFnTester

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3654?focusedWorklogId=136987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136987
 ]

ASF GitHub Bot logged work on BEAM-3654:


Author: ASF GitHub Bot
Created on: 22/Aug/18 15:15
Start Date: 22/Aug/18 15:15
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #6194: [BEAM-3654] 
Port FilterExamplesTest off DoFnTester
URL: https://github.com/apache/beam/pull/6194#issuecomment-415069004
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136987)
Time Spent: 1h 10m  (was: 1h)

> Port FilterExamplesTest off DoFnTester
> --
>
> Key: BEAM-3654
> URL: https://issues.apache.org/jira/browse/BEAM-3654
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Mikhail
>Priority: Major
>  Labels: beginner, newbie, starter
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4823) Create SNS IO

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4823?focusedWorklogId=136988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136988
 ]

ASF GitHub Bot logged work on BEAM-4823:


Author: ASF GitHub Bot
Created on: 22/Aug/18 15:16
Start Date: 22/Aug/18 15:16
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #6202: [BEAM-4823] - Adds a 
Sink to write to Amazon's SNS
URL: https://github.com/apache/beam/pull/6202#issuecomment-415069251
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136988)
Time Spent: 3h 40m  (was: 3.5h)

> Create SNS IO 
> --
>
> Key: BEAM-4823
> URL: https://issues.apache.org/jira/browse/BEAM-4823
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Affects Versions: Not applicable
>Reporter: Ankit Jhalaria
>Assignee: Ankit Jhalaria
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5156) Apache Beam on dataflow runner can't find Tensorflow for workers

2018-08-22 Thread Rasmi Elasmar (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589003#comment-16589003
 ] 

Rasmi Elasmar commented on BEAM-5156:
-

[~cyberush], is the new problem related to this one? Can you share details?

> Apache Beam on dataflow runner can't find Tensorflow for workers
> 
>
> Key: BEAM-5156
> URL: https://issues.apache.org/jira/browse/BEAM-5156
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
> Environment: google cloud compute instance running linux
>Reporter: Thomas Johns
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.5.0, 2.6.0
>
>
> Adding serialized tensorflow model to apache beam pipeline with python sdk 
> but it can not find any version of tensorflow when applied to dataflow runner 
> although it is not a problem locally. Tried various versions of tensorflow 
> from 1.6 to 1.10. I thought it might be a conflicting package some where so I 
> removed all other packages and tried to just install tensorflow and same 
> problem.
> Could not find a version that satisfies the requirement tensorflow==1.6.0 
> (from -r reqtest.txt (line 59)) (from versions: )No matching distribution 
> found for tensorflow==1.6.0 (from -r reqtest.txt (line 59))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5105) Move load job poll to finishBundle() method to better parallelize execution

2018-08-22 Thread Nick Orlove (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589002#comment-16589002
 ] 

Nick Orlove commented on BEAM-5105:
---

This is an issue I've experienced

> Move load job poll to finishBundle() method to better parallelize execution
> ---
>
> Key: BEAM-5105
> URL: https://issues.apache.org/jira/browse/BEAM-5105
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Chamikara Jayalath
>Priority: Major
>
> It appears that when we write to BigQuery using WriteTablesDoFn we start a 
> load job and wait for that job to finish.
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L318]
>  
> In cases where we are trying to write a PCollection of tables (for example, 
> when user use dynamic destinations feature) this relies on dynamic work 
> rebalancing to parallellize execution of load jobs. If the runner does not 
> support dynamic work rebalancing or does not execute dynamic work rebalancing 
> from some reason this could have significant performance drawbacks. For 
> example, scheduling times for load jobs will add up.
>  
> A better approach might be to start load jobs at process() method but wait 
> for all load jobs to finish at finishBundle() method. This will parallelize 
> any overheads as well as job execution (assuming more than one job is 
> schedule by BQ.).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5184) Multimap side inputs with duplicate keys and values are being lost

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5184?focusedWorklogId=136990&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136990
 ]

ASF GitHub Bot logged work on BEAM-5184:


Author: ASF GitHub Bot
Created on: 22/Aug/18 15:29
Start Date: 22/Aug/18 15:29
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #6257: [BEAM-5184] Multimap 
side inputs with duplicate keys and values are being lost
URL: https://github.com/apache/beam/pull/6257#issuecomment-415074170
 
 
   This seems to break Dataflow, it's side input handling is different.
   
   The Jenkins logs for 
org.apache.beam.sdk.transforms.ViewTest.testMultimapSideInputWithNonDeterministicKeyCoder
 fail with
   
   Expected: iterable over [, , , 
, ] in any order
but: No item matches:  in [, , , ]
   
   I'll try to take a look as to why this is failing as the error message is 
implying a comparison issue since all the values do exist in the actual output 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136990)
Time Spent: 1.5h  (was: 1h 20m)

> Multimap side inputs with duplicate keys and values are being lost
> --
>
> Key: BEAM-5184
> URL: https://issues.apache.org/jira/browse/BEAM-5184
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Side inputs with duplicate values are being lost due to the usage of a set 
> based multimap.
> [https://github.com/apache/beam/blob/05fb694f265dda0254d7256e938e508fec9ba098/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L293]
>  
> Originating thread: 
> [https://lists.apache.org/thread.html/48bae7cf71bf6851622cdee0e8bc8619c79c4c2273ed63f288202169@%3Cdev.beam.apache.org%3E]
>  
> Please update the existing tests to exercise this scenario as well: 
> https://github.com/apache/beam/blob/9f23ffc97535e7255245f3852b9d2f0939df5a0a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L507



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589016#comment-16589016
 ] 

Etienne Chauchot commented on BEAM-5107:


No problem [~dattran.vn01] thanks to contribute to Beam ! 

Please do not close you PR you can still rebase onto master and we will squash 
commits at the end. It was just a comment on the good way to sync with master 
for your future contributions.

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-22 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589016#comment-16589016
 ] 

Etienne Chauchot edited comment on BEAM-5107 at 8/22/18 3:29 PM:
-

No problem [~dattran.vn01] thanks to contribute to Beam ! 

Please do not close your PR you can still rebase onto master and we will squash 
commits at the end. It was just a comment on the good way to sync with master 
for your future contributions.


was (Author: echauchot):
No problem [~dattran.vn01] thanks to contribute to Beam ! 

Please do not close you PR you can still rebase onto master and we will squash 
commits at the end. It was just a comment on the good way to sync with master 
for your future contributions.

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5184) Multimap side inputs with duplicate keys and values are being lost

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5184?focusedWorklogId=136991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136991
 ]

ASF GitHub Bot logged work on BEAM-5184:


Author: ASF GitHub Bot
Created on: 22/Aug/18 15:30
Start Date: 22/Aug/18 15:30
Worklog Time Spent: 10m 
  Work Description: lukecwik edited a comment on issue #6257: [BEAM-5184] 
Multimap side inputs with duplicate keys and values are being lost
URL: https://github.com/apache/beam/pull/6257#issuecomment-415074170
 
 
   This seems to break Dataflow, it's side input handling is different.
   
   The Jenkins logs for 
`org.apache.beam.sdk.transforms.ViewTest.testMultimapSideInputWithNonDeterministicKeyCoder`
 fail with
   
   ```
   Expected: iterable over [, , , 
, ] in any order
but: No item matches:  in [, , , ]
   ```
   
   I'll try to take a look as to why this is failing as the error message is 
implying a comparison issue since all the values do exist in the actual output 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136991)
Time Spent: 1h 40m  (was: 1.5h)

> Multimap side inputs with duplicate keys and values are being lost
> --
>
> Key: BEAM-5184
> URL: https://issues.apache.org/jira/browse/BEAM-5184
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Side inputs with duplicate values are being lost due to the usage of a set 
> based multimap.
> [https://github.com/apache/beam/blob/05fb694f265dda0254d7256e938e508fec9ba098/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L293]
>  
> Originating thread: 
> [https://lists.apache.org/thread.html/48bae7cf71bf6851622cdee0e8bc8619c79c4c2273ed63f288202169@%3Cdev.beam.apache.org%3E]
>  
> Please update the existing tests to exercise this scenario as well: 
> https://github.com/apache/beam/blob/9f23ffc97535e7255245f3852b9d2f0939df5a0a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L507



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5184) Multimap side inputs with duplicate keys and values are being lost

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5184?focusedWorklogId=136992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136992
 ]

ASF GitHub Bot logged work on BEAM-5184:


Author: ASF GitHub Bot
Created on: 22/Aug/18 15:31
Start Date: 22/Aug/18 15:31
Worklog Time Spent: 10m 
  Work Description: lukecwik edited a comment on issue #6257: [BEAM-5184] 
Multimap side inputs with duplicate keys and values are being lost
URL: https://github.com/apache/beam/pull/6257#issuecomment-415074170
 
 
   I'll try to look at the failure in Dataflow, it's side input handling is 
different.
   
   The Jenkins logs for 
`org.apache.beam.sdk.transforms.ViewTest.testMultimapSideInputWithNonDeterministicKeyCoder`
 fail with
   
   ```
   Expected: iterable over [, , , 
, ] in any order
but: No item matches:  in [, , , ]
   ```
   
   The error message is implying a comparison issue since all the values do 
exist in the actual output 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136992)
Time Spent: 1h 50m  (was: 1h 40m)

> Multimap side inputs with duplicate keys and values are being lost
> --
>
> Key: BEAM-5184
> URL: https://issues.apache.org/jira/browse/BEAM-5184
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Side inputs with duplicate values are being lost due to the usage of a set 
> based multimap.
> [https://github.com/apache/beam/blob/05fb694f265dda0254d7256e938e508fec9ba098/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L293]
>  
> Originating thread: 
> [https://lists.apache.org/thread.html/48bae7cf71bf6851622cdee0e8bc8619c79c4c2273ed63f288202169@%3Cdev.beam.apache.org%3E]
>  
> Please update the existing tests to exercise this scenario as well: 
> https://github.com/apache/beam/blob/9f23ffc97535e7255245f3852b9d2f0939df5a0a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L507



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2934) Tez support for portable side input

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2934.
---
   Resolution: Later
Fix Version/s: Not applicable

> Tez support for portable side input
> ---
>
> Key: BEAM-2934
> URL: https://issues.apache.org/jira/browse/BEAM-2934
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-tez
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2911) Tez supports portable progress reporting

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2911.
---
   Resolution: Later
Fix Version/s: Not applicable

> Tez supports portable progress reporting
> 
>
> Key: BEAM-2911
> URL: https://issues.apache.org/jira/browse/BEAM-2911
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-tez
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2922) Tez support for portable user state

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2922.
---
   Resolution: Later
Fix Version/s: Not applicable

> Tez support for portable user state
> ---
>
> Key: BEAM-2922
> URL: https://issues.apache.org/jira/browse/BEAM-2922
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-tez
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2912) MapReduce supports portable progress reporting

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2912.
---
   Resolution: Later
Fix Version/s: Not applicable

> MapReduce supports portable progress reporting
> --
>
> Key: BEAM-2912
> URL: https://issues.apache.org/jira/browse/BEAM-2912
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-mapreduce
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5196) Add MD5 consistency check on S3 uploads (writes)

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5196?focusedWorklogId=136993&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136993
 ]

ASF GitHub Bot logged work on BEAM-5196:


Author: ASF GitHub Bot
Created on: 22/Aug/18 15:35
Start Date: 22/Aug/18 15:35
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #6232: [BEAM-5196]  Add MD5 
consistency check on S3 uploads (writes)
URL: https://github.com/apache/beam/pull/6232#issuecomment-415076310
 
 
   @toelen Totally agree, we have IT for most of the Google IOs, for AWS we 
have only Kinesis but it is not part of the Beam CI yet, to put this in place 
we will need some company donating a bit to pay the AWS bills (we considered a 
docker with localstack at some moment but well it be still not a 'real' test).
   
   In the meantime we need to prepare code for this (there is ongoing work on 
new AWS connectors e.g. SNS, SQS, Kinesis so this will matter).  All 
contributions in this front are welcome :) I mentioned @lgajowy in the PR 
because he lead the work on file-based tests (gs://, hdfs://) so if you are 
interested in contributing in this area maybe it is worth to take a look at 
what he did, in principle it should be straightforward to add s3. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136993)
Time Spent: 40m  (was: 0.5h)

> Add MD5 consistency check on S3 uploads (writes)
> 
>
> Key: BEAM-5196
> URL: https://issues.apache.org/jira/browse/BEAM-5196
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Leen Toelen
>Priority: Minor
> Fix For: 2.7.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2923) MapReduce support for portable user state

2018-08-22 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde closed BEAM-2923.
---
   Resolution: Later
Fix Version/s: Not applicable

> MapReduce support for portable user state
> -
>
> Key: BEAM-2923
> URL: https://issues.apache.org/jira/browse/BEAM-2923
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-mapreduce
>Reporter: Henning Rohde
>Priority: Minor
>  Labels: portability
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >