[GitHub] echauchot commented on a change in pull request #4143: Faster implementation of nexmark.Generator.nextExactString().

2017-11-23 Thread GitBox
echauchot commented on a change in pull request #4143: Faster implementation of 
nexmark.Generator.nextExactString().
URL: https://github.com/apache/beam/pull/4143#discussion_r152771825
 
 

 ##
 File path: 
sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/sources/Generator.java
 ##
 @@ -353,8 +353,16 @@ private static String nextString(Random random, int 
maxLength) {
   /** Return a random string of exactly {@code length}. */
   private static String nextExactString(Random random, int length) {
 StringBuilder sb = new StringBuilder();
+int rnd = 0;
+int n = 0;  // number of random characters left in rnd
 while (length-- > 0) {
-  sb.append((char) ('a' + random.nextInt(26)));
+  if (n == 0) {
+rnd = random.nextInt();
+n = 6;  // log_26(2^31)
+  }
+  sb.append((char) ('a' + rnd % 26));
+  rnd /= 26;
 
 Review comment:
   We will concatenate ints between 0 and 25 as before. But if `rnd` happens to 
be < 26 then it will be set in this line to 0 for the next `n` iterations (6 at 
worse) leading to appending 0 to `sb`.  So there will be a more probable 0 than 
previous implementation. Consider using `Random.intervalNextInt(length*26, 
Integer.MAX_VALUE)`. But it might be not a big deal


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] echauchot commented on a change in pull request #4128: [BEAM-3181][Nexmark][SQL] Implement query0

2017-11-23 Thread GitBox
echauchot commented on a change in pull request #4128: 
[BEAM-3181][Nexmark][SQL] Implement query0
URL: https://github.com/apache/beam/pull/4128#discussion_r152798660
 
 

 ##
 File path: 
sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/model/sql/adapter/ModelFieldsAdapters.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.nexmark.model.sql.adapter;
+
+import com.google.common.collect.ImmutableMap;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.beam.sdk.nexmark.model.Auction;
+import org.apache.beam.sdk.nexmark.model.Bid;
+import org.apache.beam.sdk.nexmark.model.Person;
+
+/**
+ * Maps Java model classes to Beam SQL record types.
+ */
+public class ModelFieldsAdapters {
+
+  public static final Map ADAPTERS =
+  ImmutableMap.builder()
+  .put(Auction.class, auctionAdapter())
+  .put(Bid.class, bidAdapter())
+  .put(Person.class, personAdapter())
+  .build();
+
+  private static ModelFieldsAdapter personAdapter() {
+return new ModelFieldsAdapter(
+BeamRecordSqlTypeBuilder.of()
+.withLongField("id")
+.withStringField("name")
 
 Review comment:
   You're right it is out of the scope of this PR. all my point was to make the 
fields mapping generic enough to be included in sql package (like you extracted 
the other feature) rather than coding it for each pojo. Like you said above 
implementing `getFieldValue(pojo, fieldName)` using reflection might be a good 
thing if it is feasible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szewi opened a new pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-23 Thread GitBox
szewi opened a new pull request #4169: [BEAM-3060] Added support for multiple 
filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szewi commented on issue #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-23 Thread GitBox
szewi commented on issue #4169: [BEAM-3060] Added support for multiple 
filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#issuecomment-346620194
 
 
   Hi @chamikaramj , can you please take a look? This allow to switch between 
filesystems by adding system property -Dfilesystem and provide filesystem 
specific pipeline options.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lgajowy commented on issue #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-23 Thread GitBox
lgajowy commented on issue #4149: [BEAM-3060] Add Compressed TextIOIT
URL: https://github.com/apache/beam/pull/4149#issuecomment-346653319
 
 
   @chamikaramj (this message is a kind reminder) :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia opened a new pull request #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-23 Thread GitBox
iemejia opened a new pull request #4170: Make ParDoLifecycleTest exception 
tests part of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-23 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346703242
 
 
   As part of the solution for the issue on the Spark runner (see BEAM-3187) I 
consider that enabling these tests for all runners makes sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-23 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346704040
 
 
   Run Gearpump ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-23 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346704033
 
 
   Run Apex ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-23 Thread GitBox
chamikaramj commented on a change in pull request #4149: [BEAM-3060] Add 
Compressed TextIOIT
URL: https://github.com/apache/beam/pull/4149#discussion_r152900718
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -83,25 +90,82 @@ private static String appendTimestamp(String 
filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
-  @Test
-  public void writeThenReadAll() {
-PCollection testFilenames = pipeline
-.apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
-.apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
-.apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
-.getPerDestinationOutputFilenames().apply(Values.create());
+  /** IO IT with no compression. */
+  @RunWith(JUnit4.class)
+  public static class UncompressedTextIOIT {
+
+@Rule
+public TestPipeline pipeline = TestPipeline.create();
+
+@Test
+public void writeThenReadAll() {
+  PCollection testFilenames = pipeline
+  .apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
+  .apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
+  .apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
+  .getPerDestinationOutputFilenames().apply(Values.create());
+
+  PCollection consolidatedHashcode = testFilenames
+  .apply("Read all files", TextIO.readAll())
+  .apply("Calculate hashcode", Combine.globally(new HashingFn()));
+
+  String expectedHash = getExpectedHashForLineCount(numberOfTextLines);
+  PAssert.thatSingleton(consolidatedHashcode).isEqualTo(expectedHash);
+
+  testFilenames.apply("Delete test files", ParDo.of(new DeleteFileFn())
+  
.withSideInputs(consolidatedHashcode.apply(View.asSingleton(;
+
+  pipeline.run().waitUntilFinish();
+}
+  }
+
+  /** IO IT with various compression types. */
+  @RunWith(Parameterized.class)
+  public static class CompressedTextIOIT {
+
+@Rule
+public TestPipeline pipeline = TestPipeline.create();
+
+@Parameterized.Parameters()
+public static Iterable data() {
+  return ImmutableList.builder()
+  .add(GZIP)
+  .add(DEFLATE)
+  .add(BZIP2)
+  .build();
+}
+
+@Parameterized.Parameter()
+public Compression compression;
+
+@Test
+public void writeThenReadAllWithCompression() {
+  TextIO.TypedWrite write = TextIO
+  .write()
+  .to(filenamePrefix)
+  .withOutputFilenames()
+  .withCompression(compression);
+
+  TextIO.ReadAll read = TextIO.readAll().withCompression(AUTO);
 
-PCollection consolidatedHashcode = testFilenames
-.apply("Read all files", TextIO.readAll())
-.apply("Calculate hashcode", Combine.globally(new HashingFn()));
+  PCollection testFilenames = pipeline
 
 Review comment:
   This and uncompressed version have the same pipeline. Can't we share to code 
between tests (and keep the same test class TextIOIT) and add "compression 
type" as a parameter to the test (a Maven -D parameter for the perfkit based 
runs) ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-23 Thread GitBox
chamikaramj commented on a change in pull request #4149: [BEAM-3060] Add 
Compressed TextIOIT
URL: https://github.com/apache/beam/pull/4149#discussion_r152900262
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -83,25 +90,82 @@ private static String appendTimestamp(String 
filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
-  @Test
-  public void writeThenReadAll() {
-PCollection testFilenames = pipeline
-.apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
-.apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
-.apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
-.getPerDestinationOutputFilenames().apply(Values.create());
+  /** IO IT with no compression. */
+  @RunWith(JUnit4.class)
+  public static class UncompressedTextIOIT {
 
 Review comment:
   Does perfkitbenchmarker-based execution 
(https://github.com/apache/beam/pull/4120) still work with these changes ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-23 Thread GitBox
chamikaramj commented on a change in pull request #4149: [BEAM-3060] Add 
Compressed TextIOIT
URL: https://github.com/apache/beam/pull/4149#discussion_r152900435
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -83,25 +90,82 @@ private static String appendTimestamp(String 
filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
-  @Test
-  public void writeThenReadAll() {
-PCollection testFilenames = pipeline
-.apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
-.apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
-.apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
-.getPerDestinationOutputFilenames().apply(Values.create());
+  /** IO IT with no compression. */
+  @RunWith(JUnit4.class)
 
 Review comment:
   This means that this test will be picked up by all test suites (including 
Java pre-commit), isn't it ? Not sure if we want to do that due to the size of 
this test. Adding to post-commit tests should be fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dmytroivanov4206 opened a new pull request #4171: [BEAM-3008] Extends API for BigtableIO Read and Write by adding withInstanceId and withProjectId

2017-11-24 Thread GitBox
dmytroivanov4206 opened a new pull request #4171: [BEAM-3008] Extends API for 
BigtableIO Read and Write by adding withInstanceId  and withProjectId 
URL: https://github.com/apache/beam/pull/4171
 
 
   Adds withInstanceId and withProjectId to the BigtableIO Read and Write 
classes, first out of four steps to fix [BEAM-3008] bug.
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [x] Each commit in the pull request should have a meaningful subject line 
and body.
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rmannibucau opened a new pull request #4172: BEAM-3243 support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-24 Thread GitBox
rmannibucau opened a new pull request #4172: BEAM-3243 support multiple 
anonymous classes from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172
 
 
   Idea is to keep the "number" (suffix) of the anonymous class for anonymous 
dofn to support multiple anonymous dofn in the same pipeline.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rmannibucau opened a new pull request #4173: [DISCUSS] add a java profile to be able to skip python/go when not relevant for current work

2017-11-24 Thread GitBox
rmannibucau opened a new pull request #4173: [DISCUSS] add a java profile to be 
able to skip python/go when not relevant for current work
URL: https://github.com/apache/beam/pull/4173
 
 
   Often working on a feature or even more on a fix you only care about a 
language - which is probably most of the time java?
   
   When building the project, the python execution time is very important (like 
half of it on my machine). However you are sure you didn't affect it since the 
code is quite parallel and almost unrelated in term of dependency.
   
   This PR adds a java profile which skips python/go sdk when building. It is 
designed to be activable through a property you can put in your settings.xml if 
you always only work with java part of beam.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xumingming commented on issue #4168: [BEAM-3238][SQL] Add BeamRecordSqlTypeBuilder

2017-11-24 Thread GitBox
xumingming commented on issue #4168: [BEAM-3238][SQL] Add 
BeamRecordSqlTypeBuilder
URL: https://github.com/apache/beam/pull/4168#issuecomment-346831152
 
 
   I like this idea!
   
   One minor comment: Can we put the `BeamRecordSqlTypeBuilder` inside 
`BeamRecordSqlType`? it will keep the surface api of 
`org.apache.beam.sdk.extensions.sql` cleaner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sduskis commented on a change in pull request #4171: [BEAM-3008] Extends API for BigtableIO Read and Write by adding withInstanceId and withProjectId

2017-11-24 Thread GitBox
sduskis commented on a change in pull request #4171: [BEAM-3008] Extends API 
for BigtableIO Read and Write by adding withInstanceId  and withProjectId 
URL: https://github.com/apache/beam/pull/4171#discussion_r152978149
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 ##
 @@ -78,38 +78,38 @@
  * The Bigtable source returns a set of rows from a single table, returning 
a
  * {@code PCollection}.
  *
- * To configure a Cloud Bigtable source, you must supply a table id and a 
{@link BigtableOptions}
- * or builder configured with the project and other information necessary to 
identify the
- * Bigtable instance. By default, {@link BigtableIO.Read} will read all rows 
in the table. The row
- * range to be read can optionally be restricted using {@link 
BigtableIO.Read#withKeyRange}, and
- * a {@link RowFilter} can be specified using {@link 
BigtableIO.Read#withRowFilter}. For example:
+ * To configure a Cloud Bigtable source, you must supply a table id, a 
project id, an instance
+ * id and optionally a {@link BigtableOptions} to provide more specific 
connection configuration.
+ * By default, {@link BigtableIO.Read} will read all rows in the table. The 
row range to be read
+ * can optionally be restricted using {@link BigtableIO.Read#withKeyRange}, 
and a {@link RowFilter}
+ * can be specified using {@link BigtableIO.Read#withRowFilter}. For example:
  *
  * {@code
- * BigtableOptions.Builder optionsBuilder =
- * new BigtableOptions.Builder()
- * .setProjectId("project")
- * .setInstanceId("instance");
  *
  * Pipeline p = ...;
  *
  * // Scan the entire table.
  * p.apply("read",
  * BigtableIO.read()
  * .withBigtableOptions(optionsBuilder)
 
 Review comment:
   Can you please remove `.withBigtableOptions(optionsBuilder)` for this 
example?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-24 Thread GitBox
lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed 
TextIOIT
URL: https://github.com/apache/beam/pull/4149#discussion_r152998002
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -83,25 +90,82 @@ private static String appendTimestamp(String 
filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
-  @Test
-  public void writeThenReadAll() {
-PCollection testFilenames = pipeline
-.apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
-.apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
-.apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
-.getPerDestinationOutputFilenames().apply(Values.create());
+  /** IO IT with no compression. */
+  @RunWith(JUnit4.class)
 
 Review comment:
   I double-checked that by running the preCommit job on my machine - those are 
not fired in PreCommit phase. Also, out of curiosity ? I investigated a little 
bit the project's mvn structure:
   
   besides the `@RunWith(JUnit.class)` annotation that is required by JUnit, we 
have two mvn plugins that look (scan) for tests:
 - surefire (looks for unit tests and searches for classes with *Test 
suffix)
 - failsafe (looks for integration tests and searches for classes with *IT 
suffix)
   
   As failsafe is not fired in the PreCommit phase, the tests are not invoked. 
Please look at [io parent 
pom](https://github.com/apache/beam/blob/master/sdks/java/io/pom.xml#L77), 
where failsafe plugin is activated only when io-it profile is active. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-24 Thread GitBox
lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed 
TextIOIT
URL: https://github.com/apache/beam/pull/4149#discussion_r152998058
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -83,25 +90,82 @@ private static String appendTimestamp(String 
filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
-  @Test
-  public void writeThenReadAll() {
-PCollection testFilenames = pipeline
-.apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
-.apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
-.apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
-.getPerDestinationOutputFilenames().apply(Values.create());
+  /** IO IT with no compression. */
+  @RunWith(JUnit4.class)
+  public static class UncompressedTextIOIT {
 
 Review comment:
   Yes, it works but runs all the 4 tests that are there in the file. But now I 
think this is probably not what we want. This won't be a problem as you 
suggested an even better solution in the comment below. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-24 Thread GitBox
lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed 
TextIOIT
URL: https://github.com/apache/beam/pull/4149#discussion_r152999363
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -83,25 +90,82 @@ private static String appendTimestamp(String 
filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
-  @Test
-  public void writeThenReadAll() {
-PCollection testFilenames = pipeline
-.apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
-.apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
-.apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
-.getPerDestinationOutputFilenames().apply(Values.create());
+  /** IO IT with no compression. */
+  @RunWith(JUnit4.class)
+  public static class UncompressedTextIOIT {
+
+@Rule
+public TestPipeline pipeline = TestPipeline.create();
+
+@Test
+public void writeThenReadAll() {
+  PCollection testFilenames = pipeline
+  .apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
+  .apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
+  .apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
+  .getPerDestinationOutputFilenames().apply(Values.create());
+
+  PCollection consolidatedHashcode = testFilenames
+  .apply("Read all files", TextIO.readAll())
+  .apply("Calculate hashcode", Combine.globally(new HashingFn()));
+
+  String expectedHash = getExpectedHashForLineCount(numberOfTextLines);
+  PAssert.thatSingleton(consolidatedHashcode).isEqualTo(expectedHash);
+
+  testFilenames.apply("Delete test files", ParDo.of(new DeleteFileFn())
+  
.withSideInputs(consolidatedHashcode.apply(View.asSingleton(;
+
+  pipeline.run().waitUntilFinish();
+}
+  }
+
+  /** IO IT with various compression types. */
+  @RunWith(Parameterized.class)
+  public static class CompressedTextIOIT {
+
+@Rule
+public TestPipeline pipeline = TestPipeline.create();
+
+@Parameterized.Parameters()
+public static Iterable data() {
+  return ImmutableList.builder()
+  .add(GZIP)
+  .add(DEFLATE)
+  .add(BZIP2)
+  .build();
+}
+
+@Parameterized.Parameter()
+public Compression compression;
+
+@Test
+public void writeThenReadAllWithCompression() {
+  TextIO.TypedWrite write = TextIO
+  .write()
+  .to(filenamePrefix)
+  .withOutputFilenames()
+  .withCompression(compression);
+
+  TextIO.ReadAll read = TextIO.readAll().withCompression(AUTO);
 
-PCollection consolidatedHashcode = testFilenames
-.apply("Read all files", TextIO.readAll())
-.apply("Calculate hashcode", Combine.globally(new HashingFn()));
+  PCollection testFilenames = pipeline
 
 Review comment:
   I think it's hard to do right now without modifying perfkit's code. As we 
checked, perfkit ignores -D parameters because builds the mvn verify command by 
itself from the parameters passed . I think this could be done in some future 
contribution. We will file a bug report in perfkit soon. 
   
   I think the best solution (at least for now) is to leave the compression 
type in pipeline options. We pass them to perfkit either way (through 
`beam_it_options`) and, what imo is more important, compressionType is very 
test specific (same as numberOfRecords). WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xumingming commented on issue #4173: [DISCUSS] add a java profile to be able to skip python/go when not relevant for current work

2017-11-24 Thread GitBox
xumingming commented on issue #4173: [DISCUSS] add a java profile to be able to 
skip python/go when not relevant for current work
URL: https://github.com/apache/beam/pull/4173#issuecomment-346862839
 
 
   I like this idea.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] lgajowy commented on issue #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-24 Thread GitBox
lgajowy commented on issue #4149: [BEAM-3060] Add Compressed TextIOIT
URL: https://github.com/apache/beam/pull/4149#issuecomment-346866400
 
 
   @chamikaramj thanks for the review! Here's another batch of changes, as 
commented above. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346871884
 
 
   Run Flink ValidatesRunner
   Run Spark ValidatesRunner
   Run Dataflow ValidatesRunner
   Run Apex ValidatesRunner
   Run Gearpump ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346703983
 
 
   Run Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872487
 
 
   Run Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346704033
 
 
   Run Apex ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346704040
 
 
   Run Gearpump ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346704003
 
 
   Run Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346703990
 
 
   Run Spark ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872593
 
 
   Run Apex ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872574
 
 
   Run Spark ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346871884
 
 
   Run Flink ValidatesRunner
   Run Spark ValidatesRunner
   Run Dataflow ValidatesRunner
   Run Apex ValidatesRunner
   Run Gearpump ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872611
 
 
   Run Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox
iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872620
 
 
   Run Gearpump ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia opened a new pull request #4174: [BEAM-3244] Ensure execution of teardown method on Flink's DoFnOperator

2017-11-24 Thread GitBox
iemejia opened a new pull request #4174: [BEAM-3244] Ensure execution of 
teardown method on Flink's DoFnOperator
URL: https://github.com/apache/beam/pull/4174
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [x] Each commit in the pull request should have a meaningful subject line 
and body.
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on issue #4174: [BEAM-3244] Ensure execution of teardown method on Flink's DoFnOperator

2017-11-24 Thread GitBox
iemejia commented on issue #4174: [BEAM-3244] Ensure execution of teardown 
method on Flink's DoFnOperator
URL: https://github.com/apache/beam/pull/4174#issuecomment-346875689
 
 
   Run Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] iemejia commented on a change in pull request #4174: [BEAM-3244] Ensure execution of teardown method on Flink's DoFnOperator

2017-11-24 Thread GitBox
iemejia commented on a change in pull request #4174: [BEAM-3244] Ensure 
execution of teardown method on Flink's DoFnOperator
URL: https://github.com/apache/beam/pull/4174#discussion_r153012280
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
 ##
 @@ -380,7 +386,6 @@ public void close() throws Exception {
   }
 }
 checkFinishBundleTimer.cancel(true);
 
 Review comment:
   @aljoscha I had the doubt if this one should be moved to dispose too, given 
that close can eventually not be called. WDYT ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reuvenlax commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-24 Thread GitBox
reuvenlax commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153034261
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -339,50 +297,189 @@ public boolean isWindowedWrites() {
 sink, computeNumShards, numShardsProvider, true, 
maxNumWritersPerBundle, sideInputs);
   }
 
-  private static class WriterKey {
-private final BoundedWindow window;
-private final PaneInfo paneInfo;
-private final DestinationT destination;
+  @Override
+  public void validate(PipelineOptions options) {
+sink.validate(options);
+  }
 
-WriterKey(BoundedWindow window, PaneInfo paneInfo, DestinationT 
destination) {
-  this.window = window;
-  this.paneInfo = paneInfo;
-  this.destination = destination;
+  @Override
+  public WriteFilesResult expand(PCollection input) {
+if (input.isBounded() == IsBounded.UNBOUNDED) {
+  checkArgument(
+  windowedWrites,
+  "Must use windowed writes when applying %s to an unbounded 
PCollection",
+  WriteFiles.class.getSimpleName());
+}
+if (windowedWrites) {
+  // The reason for this is https://issues.apache.org/jira/browse/BEAM-1438
+  // and similar behavior in other runners.
+  checkArgument(
+  computeNumShards != null || numShardsProvider != null,
+  "When using windowed writes, must specify number of output shards 
explicitly",
+  WriteFiles.class.getSimpleName());
 }
+this.writeOperation = sink.createWriteOperation();
+this.writeOperation.setWindowedWrites(windowedWrites);
 
-@Override
-public boolean equals(Object o) {
-  if (!(o instanceof WriterKey)) {
-return false;
-  }
-  WriterKey other = (WriterKey) o;
-  return Objects.equal(window, other.window)
-  && Objects.equal(paneInfo, other.paneInfo)
-  && Objects.equal(destination, other.destination);
+if (!windowedWrites) {
+  // Re-window the data into the global window and remove any existing 
triggers.
+  input =
+  input.apply(
+  "RewindowIntoGlobal",
+  Window.into(new GlobalWindows())
+  .triggering(DefaultTrigger.of())
+  .discardingFiredPanes());
+}
+
+Coder destinationCoder;
+try {
+  destinationCoder =
+  getDynamicDestinations()
+  
.getDestinationCoderWithDefault(input.getPipeline().getCoderRegistry());
+  destinationCoder.verifyDeterministic();
+} catch (CannotProvideCoderException | NonDeterministicException e) {
+  throw new RuntimeException(e);
+}
+@SuppressWarnings("unchecked")
+Coder windowCoder =
+(Coder) 
input.getWindowingStrategy().getWindowFn().windowCoder();
+FileResultCoder fileResultCoder =
+FileResultCoder.of(windowCoder, destinationCoder);
+
+PCollectionView numShardsView =
+(computeNumShards == null) ? null : input.apply(computeNumShards);
+
+PCollection> tempFileResults =
+(computeNumShards == null && numShardsProvider == null)
+? input.apply(
+"WriteUnshardedBundlesToTempFiles",
 
 Review comment:
   Unfortunately, refactoring into new PTransforms changes the name of every 
single sub step (since step names are hierarchical).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reuvenlax commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-24 Thread GitBox
reuvenlax commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r152652760
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -824,177 +826,78 @@ public void startBundle() {
 public void processElement(ProcessContext c) {
   fileResults.add(c.element());
   if (fixedNumShards == null) {
-if (numShardsView != null) {
-  fixedNumShards = c.sideInput(numShardsView);
-} else if (numShardsProvider != null) {
-  fixedNumShards = numShardsProvider.get();
-} else {
-  throw new IllegalStateException(
-  "When finalizing a windowed write, should have set fixed 
sharding");
-}
+fixedNumShards = getFixedNumShards.apply(c);
+checkState(fixedNumShards != null, "Windowed write should have set 
fixed sharding");
 
 Review comment:
   Windowed (non triggered) writes in batch do not need fixed sharding


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on issue #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-25 Thread GitBox
chamikaramj commented on issue #4149: [BEAM-3060] Add Compressed TextIOIT
URL: https://github.com/apache/beam/pull/4149#issuecomment-346929621
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-25 Thread GitBox
chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added 
support for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153042053
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -139,6 +139,24 @@
 
 
 
+
+
+google-cloud-storage
+
+
+filesystem
+GCS
 
 Review comment:
   Does this require GCS to be all caps ? If so is there a way to not require 
that ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-25 Thread GitBox
chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added 
support for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153042101
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -81,13 +81,38 @@ public static void setup() throws ParseException {
 .as(IOTestPipelineOptions.class);
 
 numberOfTextLines = options.getNumberOfRecords();
-filenamePrefix = appendTimestamp(options.getFilenamePrefix());
+filenamePrefix = resolveProtocolAndPath(options);
   }
 
   private static String appendTimestamp(String filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
+  private static String resolveProtocolAndPath(IOTestPipelineOptions options) {
 
 Review comment:
   I'm not sure why we need to parse and reassemble protocol here. We shouldn't 
have to do this if we ask user to give the full prefix that includes the 
protocol.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-25 Thread GitBox
chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added 
support for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153042000
 
 

 ##
 File path: 
sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
 ##
 @@ -100,4 +100,14 @@
   String getFilenamePrefix();
 
   void setFilenamePrefix(String prefix);
+
+  @Description("Google cloud storage - bucket_name/path")
+  String getGcsLocation();
 
 Review comment:
   Why don't we use fileNamePrefix for all file-systems instead of introducing 
a property per file-system ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-26 Thread GitBox
jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes 
from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347018533
 
 
   Using multiple anonymous DoFn's with the same enclosing class within the 
same composite transform is already possible if you specify the transform name 
in .apply() - e.g.: .apply("Something", ParDo.of(new DoFn..)).apply("Something 
else", ParDo.of(new DoFn..)). This is a good thing rather than a bug, because 
using generated names like `Enclosing$1` is unstable w.r.t. pipeline update: 
any reordering of the anonymous classes, or adding a new one, or making an 
existing one be non-anonymous, will change the numbering and make the pipeline 
update-incompatible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rmannibucau commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-26 Thread GitBox
rmannibucau commented on issue #4172: [BEAM-3243] support multiple anonymous 
classes from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347019445
 
 
   I understand where it comes from but it is very boring when writing tests 
where it is not uncommon to write anonymous classes and the inline naming is 
not always doable when you write utility methods or reusable piece of pipelines.
   
   You would also note that anonymous algorithm with the "number" is somehow 
aligned on the fact to keep only the suffix in the nested class case which 
leads to as meaningless names (Important$Stuff leads to Stuff which is in 
general not very meaningful for the "task" context since it is hold by the 
enclosing class to avoid long and repeating names).
   
   The original issue is when it fails it is quite abstract and not very 
obvious. An alternative can be to enrich the error message with:
   
   1. where the anonymous *classes* (all conflicting ones) are
   2. how to fix it - passing a name to the apply
   
   I would be fine with this "not solution" fix as well, does it sound better 
for you?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-26 Thread GitBox
jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes 
from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347029760
 
 
   Yeah, I would prefer an improved error message. I suppose you mean the 
message that says `Transform Foo does not have a stable unique name. This will 
prevent reloading of pipelines.` - that message definitely should point to 
specifying the name in .apply() as a fix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rmannibucau commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-26 Thread GitBox
rmannibucau commented on issue #4172: [BEAM-3243] support multiple anonymous 
classes from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347035725
 
 
   Something like

   Transform Foo$1 conflicts with Foo$2 in pipeline defined in MyTest line 
56. You can fix it adding a name in apply invocations line 54 and 52.
   
   The long name is important cause otherwise you have no clue of the 
provenance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tweise commented on issue #4074: [BEAM-3130] View.asMap() causes a ClassCastException in Apex runner

2017-11-26 Thread GitBox
tweise commented on issue #4074: [BEAM-3130] View.asMap() causes a 
ClassCastException in Apex runner
URL: https://github.com/apache/beam/pull/4074#issuecomment-347068452
 
 
   @jkff tests are added


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] luke-zhu opened a new pull request #4176: [BEAM-3143] Type Inference Compatibility with Python 3

2017-11-26 Thread GitBox
luke-zhu opened a new pull request #4176: [BEAM-3143] Type Inference 
Compatibility with Python 3
URL: https://github.com/apache/beam/pull/4176
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   Builds of Holden's work to get a type inference solution that passes 
precommit tests on Python 2 and type inference unit tests on Python 3.5.
   
   The disassembler code may need more changes if we aim for 3.6+ due to the 
byteword to quadword change. I've ported some code from CPython's lib/dis to 
disassembly.py This should make any future migration to 3.6+ easier.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058028
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
+proxy_url = proxy_protocol + '://' + proxy_url
+  return httplib2.proxy_info_from_url(proxy_url, method=proxy_protocol)
+
+def GetNewHttp(http_class=httplib2.Http, **kwargs):
+  """Creates and returns a new httplib2.Http instance.
+  Args:
+http_class: Optional custom Http class to use.
 
 Review comment:
   Nit: s/Optiona/optional and s/Arguments/arguments


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153057996
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
 
 Review comment:
   Why '_' ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058033
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
+proxy_url = proxy_protocol + '://' + proxy_url
+  return httplib2.proxy_info_from_url(proxy_url, method=proxy_protocol)
+
+def GetNewHttp(http_class=httplib2.Http, **kwargs):
+  """Creates and returns a new httplib2.Http instance.
+  Args:
+http_class: Optional custom Http class to use.
+**kwargs: Arguments to pass to http_class constructor.
+  Returns:
+An initialized httplib2.Http instance.
+  """
+  proxy_info = httplib2.ProxyInfo(
+proxy_type=3,
+proxy_host=None,
 
 Review comment:
   Why do we have to specify 'None' values here instead of leaving default ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058019
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
 
 Review comment:
   Nit: s/Environment/environment


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153057981
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
 
 Review comment:
   What should the format of this be ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058069
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
+proxy_url = proxy_protocol + '://' + proxy_url
+  return httplib2.proxy_info_from_url(proxy_url, method=proxy_protocol)
+
+def GetNewHttp(http_class=httplib2.Http, **kwargs):
+  """Creates and returns a new httplib2.Http instance.
+  Args:
+http_class: Optional custom Http class to use.
+**kwargs: Arguments to pass to http_class constructor.
+  Returns:
+An initialized httplib2.Http instance.
+  """
+  proxy_info = httplib2.ProxyInfo(
+proxy_type=3,
+proxy_host=None,
+proxy_port=None,
+proxy_user=None,
+proxy_pass=None,
+proxy_rdns=None
+  )
+
+  for proxy_env_var in ['http_proxy', 'HTTP_PROXY', 'https_proxy', 
'HTTPS_PROXY']:
 
 Review comment:
   Why do we have to mention all these values here ? Isn't it enough to use a 
single variable (say, http_proxy) and ask users to always use that ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058006
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
 
 Review comment:
   Log a warning or raise an error ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153057990
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
 
 Review comment:
   Log a warning that proxy_env_var is ignored.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jbonofre commented on issue #3808: [BEAM-1920] Add a Spark 2.x support in the Spark runner

2017-11-26 Thread GitBox
jbonofre commented on issue #3808: [BEAM-1920] Add a Spark 2.x support in the 
Spark runner
URL: https://github.com/apache/beam/pull/3808#issuecomment-347089270
 
 
   Following the vote on the mailing lists, I updated the PR with Spark 2.x 
update only (no support of Spark 1 anymore).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] charlesccychen commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-27 Thread GitBox
charlesccychen commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153152127
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
+proxy_url = proxy_protocol + '://' + proxy_url
+  return httplib2.proxy_info_from_url(proxy_url, method=proxy_protocol)
+
+def GetNewHttp(http_class=httplib2.Http, **kwargs):
 
 Review comment:
   
   http_class and kwargs are never used in any call to `GetNewHttp`.  Please 
remove these arguments, as they are not necessary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] charlesccychen commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-27 Thread GitBox
charlesccychen commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153152125
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
+proxy_url = proxy_protocol + '://' + proxy_url
+  return httplib2.proxy_info_from_url(proxy_url, method=proxy_protocol)
+
+def GetNewHttp(http_class=httplib2.Http, **kwargs):
+  """Creates and returns a new httplib2.Http instance.
+  Args:
+http_class: Optional custom Http class to use.
+**kwargs: Arguments to pass to http_class constructor.
+  Returns:
+An initialized httplib2.Http instance.
+  """
+  proxy_info = httplib2.ProxyInfo(
+proxy_type=3,
+proxy_host=None,
+proxy_port=None,
+proxy_user=None,
+proxy_pass=None,
+proxy_rdns=None
+  )
+
+  for proxy_env_var in ['http_proxy', 'HTTP_PROXY', 'https_proxy', 
'HTTPS_PROXY']:
+if proxy_env_var in os.environ and os.environ[proxy_env_var]:
+  proxy_info = ProxyInfoFromEnvironmentVar(proxy_env_var)
+  break
+
+  # Use a non-infinite SSL timeout to avoid hangs during network flakiness.
+  kwargs['timeout'] = DEFAULT_HTTP_TIMEOUT_SECONDS
 
 Review comment:
   
   kwargs is unnecessary--just pass the `timeout` kwarg directly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] charlesccychen commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-27 Thread GitBox
charlesccychen commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153152122
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
+proxy_url = proxy_protocol + '://' + proxy_url
+  return httplib2.proxy_info_from_url(proxy_url, method=proxy_protocol)
+
+def GetNewHttp(http_class=httplib2.Http, **kwargs):
+  """Creates and returns a new httplib2.Http instance.
+  Args:
+http_class: Optional custom Http class to use.
+**kwargs: Arguments to pass to http_class constructor.
+  Returns:
+An initialized httplib2.Http instance.
+  """
+  proxy_info = httplib2.ProxyInfo(
+proxy_type=3,
+proxy_host=None,
+proxy_port=None,
+proxy_user=None,
+proxy_pass=None,
+proxy_rdns=None
+  )
+
+  for proxy_env_var in ['http_proxy', 'HTTP_PROXY', 'https_proxy', 
'HTTPS_PROXY']:
+if proxy_env_var in os.environ and os.environ[proxy_env_var]:
+  proxy_info = ProxyInfoFromEnvironmentVar(proxy_env_var)
+  break
+
+  # Use a non-infinite SSL timeout to avoid hangs during network flakiness.
+  kwargs['timeout'] = DEFAULT_HTTP_TIMEOUT_SECONDS
+  http = http_class(proxy_info=proxy_info, **kwargs)
+  return http
 
 Review comment:
   
   Directly return the object (no intermediate variable `http` necessary).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] charlesccychen commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-27 Thread GitBox
charlesccychen commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153152130
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
 
 Review comment:
   
   Please use PEP8 function names thoughout 
(https://www.python.org/dev/peps/pep-0008/#function-names).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] aaltay commented on issue #4176: [BEAM-3143] Type Inference Compatibility with Python 3

2017-11-27 Thread GitBox
aaltay commented on issue #4176: [BEAM-3143] Type Inference Compatibility with 
Python 3
URL: https://github.com/apache/beam/pull/4176#issuecomment-347254939
 
 
   cc: @robertwb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] mdvorsky commented on a change in pull request #4143: Faster implementation of nexmark.Generator.nextExactString().

2017-11-27 Thread GitBox
mdvorsky commented on a change in pull request #4143: Faster implementation of 
nexmark.Generator.nextExactString().
URL: https://github.com/apache/beam/pull/4143#discussion_r153266699
 
 

 ##
 File path: 
sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/sources/Generator.java
 ##
 @@ -353,8 +353,16 @@ private static String nextString(Random random, int 
maxLength) {
   /** Return a random string of exactly {@code length}. */
   private static String nextExactString(Random random, int length) {
 StringBuilder sb = new StringBuilder();
+int rnd = 0;
+int n = 0;  // number of random characters left in rnd
 while (length-- > 0) {
-  sb.append((char) ('a' + random.nextInt(26)));
+  if (n == 0) {
+rnd = random.nextInt();
+n = 6;  // log_26(2^31)
+  }
+  sb.append((char) ('a' + rnd % 26));
+  rnd /= 26;
 
 Review comment:
   The way I imagine it is that we're generating a number in base 26. We used 
to generate it one  "digit" at a time, now we generate 6 digits at the same 
time, so the probability of a given resulting string should be roughly the 
same. E.g., now generating rnd=0 should be the same as previously generating 0 
six times in a row. 
   
   To make it exactly the same, I'd need to use 26^6 as the upper value for 
rnd, instead of Integer.MAX_VALUE. I could make that change if you'd like. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4173: [DISCUSS] add a java profile to be able to skip python/go when not relevant for current work

2017-11-27 Thread GitBox
jkff commented on issue #4173: [DISCUSS] add a java profile to be able to skip 
python/go when not relevant for current work
URL: https://github.com/apache/beam/pull/4173#issuecomment-347278456
 
 
   Seems like a good idea, though please bring it up on the mailing list - I 
think it's been discussed before and it'd be good if people who participated in 
that discussion chimed in. As a workaround, I typically use `-pl sdks/java/core 
-am -amd`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any 
performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153282900
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   Is this DoFn really needed? I'm wondering if your implementation of the 
combiner is sufficient for performance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any 
performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153282640
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
+private long limit;
 
-public SampleAnyDoFn(long limit, PCollectionView> 
iterableView) {
+public SampleAnyDoFn(long limit) {
   this.limit = limit;
-  this.iterableView = iterableView;
 }
 
 @ProcessElement
 public void processElement(ProcessContext c) {
-  for (T i : c.sideInput(iterableView)) {
-if (limit-- <= 0) {
-  break;
-}
-c.output(i);
+  if (--limit >= 0) {
 
 Review comment:
   Please make sure to reset the limit in startBundle/finishBundle.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #3808: [BEAM-1920] Add a Spark 2.x support in the Spark runner

2017-11-27 Thread GitBox
jkff commented on issue #3808: [BEAM-1920] Add a Spark 2.x support in the Spark 
runner
URL: https://github.com/apache/beam/pull/3808#issuecomment-347279993
 
 
   Please ping the thread when the PR is ready for review. I think you said on 
the mailing list that some tests are still pending? And, since this is a major 
change, please also run the ValidatesRunner tests and I think it'd be prudent 
to perform the typical manual release validation steps (quickstart, game 
examples) against a real Spark cluster.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff opened a new pull request #4177: [BEAM-2870] Strips partition decorators when creating/patching tables in batch

2017-11-27 Thread GitBox
jkff opened a new pull request #4177: [BEAM-2870] Strips partition decorators 
when creating/patching tables in batch
URL: https://github.com/apache/beam/pull/4177
 
 
   Addresses 
https://stackoverflow.com/questions/47351578/create-dynamic-side-outputs-in-apache-beam-dataflow?noredirect=1#comment81973314_47351578
   
   R: @chamikaramj 
   CC: @reuvenlax 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kamszPolidea commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
kamszPolidea commented on a change in pull request #4169: [BEAM-3060] Added 
support for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153293304
 
 

 ##
 File path: 
sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
 ##
 @@ -100,4 +100,14 @@
   String getFilenamePrefix();
 
   void setFilenamePrefix(String prefix);
+
+  @Description("Google cloud storage - bucket_name/path")
+  String getGcsLocation();
 
 Review comment:
   We can use `--filenamePrefix`, but then we need to provide full 
communication scheme there for GCS or HDFS, for instance 
`gs://bucket/path/file` or `hdfs://hadoop-master:port/dfs-path/file`. If we 
assume that user running tests will know it then those two gcsLocation and 
hdfsLocation could be ommited. This is basically implementation of our proposal 
https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit#heading=h.29mfbxd6kc64
 . Do you think would be better to remove those two pipeline options and just 
depend on pipelinePrefix ? Should I also remove protocol resolving part then ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kamszPolidea commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
kamszPolidea commented on a change in pull request #4169: [BEAM-3060] Added 
support for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153293304
 
 

 ##
 File path: 
sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
 ##
 @@ -100,4 +100,14 @@
   String getFilenamePrefix();
 
   void setFilenamePrefix(String prefix);
+
+  @Description("Google cloud storage - bucket_name/path")
+  String getGcsLocation();
 
 Review comment:
   We can use `--filenamePrefix`, but then we need to provide full 
communication scheme there for GCS or HDFS, for instance 
`gs://bucket/path/file` or `hdfs://hadoop-master:port/dfs-path/file`. If we 
assume that user running tests will know it then those two gcsLocation and 
hdfsLocation could be ommited. This is basically implementation of our proposal 
https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit#heading=h.29mfbxd6kc64
 . Do you think would be better to remove those two pipeline options and just 
depend on pipelinePrefix ? Should I also remove protocol resolving part then ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szewi commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
szewi commented on a change in pull request #4169: [BEAM-3060] Added support 
for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153293511
 
 

 ##
 File path: 
sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
 ##
 @@ -100,4 +100,14 @@
   String getFilenamePrefix();
 
   void setFilenamePrefix(String prefix);
+
+  @Description("Google cloud storage - bucket_name/path")
+  String getGcsLocation();
 
 Review comment:
   We can use `--filenamePrefix`, but then we need to provide full 
communication scheme there for GCS or HDFS, for instance 
`gs://bucket/path/file` or `hdfs://hadoop-master:port/dfs-path/file`. If we 
assume that user running tests will know it then those two gcsLocation and 
hdfsLocation could be ommited. This is basically implementation of our proposal 
https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit#heading=h.29mfbxd6kc64
 . Do you think would be better to remove those two pipeline options and just 
depend on pipelinePrefix ? Should I also remove protocol resolving part then ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szewi commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
szewi commented on a change in pull request #4169: [BEAM-3060] Added support 
for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153293998
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -139,6 +139,24 @@
 
 
 
+
+
+google-cloud-storage
+
+
+filesystem
+GCS
 
 Review comment:
   When provided -Dfilesystem=gcs it won't activate this profile. We should 
make decision whether uppercased or lowercased value of property is better. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szewi commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
szewi commented on a change in pull request #4169: [BEAM-3060] Added support 
for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153293998
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -139,6 +139,24 @@
 
 
 
+
+
+google-cloud-storage
+
+
+filesystem
+GCS
 
 Review comment:
   When provided `-Dfilesystem=gcs` it won't activate this profile. We should 
make decision whether uppercased or lowercased value of property is better. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szewi commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
szewi commented on a change in pull request #4169: [BEAM-3060] Added support 
for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153293511
 
 

 ##
 File path: 
sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
 ##
 @@ -100,4 +100,14 @@
   String getFilenamePrefix();
 
   void setFilenamePrefix(String prefix);
+
+  @Description("Google cloud storage - bucket_name/path")
+  String getGcsLocation();
 
 Review comment:
   We can use `--filenamePrefix`, but then we need to provide full 
communication scheme there for GCS or HDFS, for instance 
`gs://bucket/path/file` or `hdfs://hadoop-master:port/dfs-path/file`. If we 
assume that user running tests will know it then those two gcsLocation and 
hdfsLocation could be ommited. This is basically implementation of our proposal 
https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit#heading=h.29mfbxd6kc64
 . Do you think would be better to remove those two pipeline options and just 
depend on filenamePrefix ? Should I also remove protocol resolving part then ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] szewi commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
szewi commented on a change in pull request #4169: [BEAM-3060] Added support 
for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153296166
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -81,13 +81,38 @@ public static void setup() throws ParseException {
 .as(IOTestPipelineOptions.class);
 
 numberOfTextLines = options.getNumberOfRecords();
-filenamePrefix = appendTimestamp(options.getFilenamePrefix());
+filenamePrefix = resolveProtocolAndPath(options);
   }
 
   private static String appendTimestamp(String filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
+  private static String resolveProtocolAndPath(IOTestPipelineOptions options) {
 
 Review comment:
   Sure. However if in the future we will have kind of "validator" that will 
validate test input parameters, then it would reassemble `--filenamePrefix` I 
guess. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] angoenka opened a new pull request #4178: [BEAM-3239] Adding debug server to sdk worker to get threaddumps

2017-11-27 Thread GitBox
angoenka opened a new pull request #4178: [BEAM-3239] Adding debug server to 
sdk worker to get threaddumps
URL: https://github.com/apache/beam/pull/4178
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   Adding debug server to dump thread stack on http request


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn opened a new pull request #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn opened a new pull request #4179: Fix repository path for Python 
Jenkins builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179
 
 
   ?hanged in PR/3976
   
   This should fix BEAM-3120.
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347305706
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347309489
 
 
   @kennknowles @jasonkuster @lukecwik - Do we have a process to test groovy 
changes before the commit?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
kennknowles commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347310170
 
 
   You can `run seed job` and it will actually be installed right away. It can 
then be rolled back if broken. Other than that, you need to have a test Jenkins 
cluster, which is a bit of work.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] reuvenlax commented on issue #4116: [BEAM-2953] Part 1 of Multipart advanced timeseries examples

2017-11-27 Thread GitBox
reuvenlax commented on issue #4116: [BEAM-2953] Part 1 of Multipart advanced 
timeseries examples
URL: https://github.com/apache/beam/pull/4116#issuecomment-347321463
 
 
   R: @kennknowles 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pabloem opened a new pull request #4180: Updating dataflow API version to newer release.

2017-11-27 Thread GitBox
pabloem opened a new pull request #4180: Updating dataflow API version to newer 
release.
URL: https://github.com/apache/beam/pull/4180
 
 
   r: @bjchambers 
   This new release contains the changes to the Dataflow API protos for side 
input inter transform IO.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347331807
 
 
   run seed job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347331790
 
 
   Thanks, that's helpful.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347332806
 
 
   Run Python PostCommit
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347333086
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix 
Sample.any performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153329942
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   Probably not, but that might change the semantics a bit. Without this, we'll 
combine each window into into `n` or fewer elements. But I guess that's more 
natural?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix 
Sample.any performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153330138
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   OTOH the iterable view approeach won't even work with unbounded collection 
so guess this new semantic is better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any 
performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153332162
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   Not sure why you say that: views are also per-window, so I think it 
shouldn't matter whether the collection is bounded or unbounded. (though, of 
course, it'll be behaving weirdly in case of multiple trigger firings - see 
also https://issues.apache.org/jira/browse/BEAM-2305, maybe similar issues 
apply here too)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153291513
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
 ##
 @@ -686,40 +756,38 @@ public int compare(
  */
 @VisibleForTesting
 @Experimental(Kind.FILESYSTEM)
-final void copyToOutputFiles(
+final void moveToOutputFiles(
 List, ResourceId>> 
resultsToFinalFilenames) throws IOException {
   int numFiles = resultsToFinalFilenames.size();
-  if (numFiles > 0) {
-LOG.debug("Copying {} files.", numFiles);
-List srcFiles = new 
ArrayList<>(resultsToFinalFilenames.size());
-List dstFiles = new 
ArrayList<>(resultsToFinalFilenames.size());
-for (KV, ResourceId> entry : 
resultsToFinalFilenames) {
-  srcFiles.add(entry.getKey().getTempFilename());
-  dstFiles.add(entry.getValue());
-  LOG.info(
-  "Will copy temporary file {} to final location {}",
-  entry.getKey().getTempFilename(),
-  entry.getValue());
-}
-// During a failure case, files may have been deleted in an earlier 
step. Thus
-// we ignore missing files here.
-FileSystems.copy(srcFiles, dstFiles, 
StandardMoveOptions.IGNORE_MISSING_FILES);
-  } else {
-LOG.info("No output files to write.");
+  LOG.debug("Copying {} files.", numFiles);
+  List srcFiles = new 
ArrayList<>(resultsToFinalFilenames.size());
+  List dstFiles = new 
ArrayList<>(resultsToFinalFilenames.size());
 
 Review comment:
   Seems overkill - they are created right next to each other in code, and 
FileSystems.copy() already does that verification. I removed the size hints to 
make it a little simpler (preallocation probably doesn't matter here).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153294218
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -1011,14 +1002,19 @@ public void processElement(ProcessContext c) throws 
Exception {
   writer.open(uuid, destination);
   writer.close();
   completeResults.add(
-  new FileResult<>(writer.getOutputFile(), shard, null, null, 
destination));
+  new FileResult<>(
+  writer.getOutputFile(),
+  shard,
+  GlobalWindow.INSTANCE,
+  PaneInfo.ON_TIME_AND_ONLY_FIRING,
 
 Review comment:
   Previously the code was structured differently, and the values passed in 
this particular codepath ended up being ignored. I consolidated things somewhat 
to handle much of windowed and unwindowed case the same way, and made the 
requirements more strict, in particular that window and pane have to be always 
set.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153293639
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -527,13 +521,8 @@ public void processElement(ProcessContext c, 
BoundedWindow window) throws Except
   writer.cleanup();
   throw e;
 }
-int shardNumber =
-shardNumberAssignment == ShardAssignment.ASSIGN_WHEN_WRITING
-? c.element().getKey().getShardNumber()
-: UNKNOWN_SHARDNUM;
-c.output(
-new FileResult<>(
-writer.getOutputFile(), shardNumber, window, c.pane(), 
entry.getKey()));
+int shard = c.element().getKey().getShardNumber();
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153291649
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -672,8 +661,11 @@ public void processElement(ProcessContext context) throws 
IOException {
 // PCollection. There is a dependency between this ParDo and the first (the
 // WriteOperation PCollection as a side input), so this will happen after 
the
 // initial ParDo.
-PCollection> results;
-final PCollectionView numShardsView;
+PCollectionView numShardsView =
+(computeNumShards == null) ? null : input.apply(computeNumShards);
+List> shardingSideInputs = numShardsView == null
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153291927
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -824,177 +826,78 @@ public void startBundle() {
 public void processElement(ProcessContext c) {
   fileResults.add(c.element());
   if (fixedNumShards == null) {
-if (numShardsView != null) {
-  fixedNumShards = c.sideInput(numShardsView);
-} else if (numShardsProvider != null) {
-  fixedNumShards = numShardsProvider.get();
-} else {
-  throw new IllegalStateException(
-  "When finalizing a windowed write, should have set fixed 
sharding");
-}
+fixedNumShards = getFixedNumShards.apply(c);
+checkState(fixedNumShards != null, "Windowed write should have set 
fixed sharding");
 
 Review comment:
   After https://github.com/apache/beam/pull/4124 they do - see 
https://github.com/apache/beam/pull/4137 for explanation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix 
Sample.any performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153334683
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   With the current implementation, I got `java.lang.IllegalArgumentException: 
Attempted to get side input window for GlobalWindow from non-global WindowFn` 
with the following snippet. Am I doing something wrong?
   
   ```java
   public class Test {
 public static void main(String[] args) {
   PipelineOptions options = PipelineOptionsFactory.create();
   Pipeline p = Pipeline.create(options);
   
   p.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))
   .apply(ParDo.of(new DoFn() {
 @ProcessElement
 public void processElement(ProcessContext c) {
   Integer x = c.element();
   c.outputWithTimestamp(x, new Instant(x * 1));
 }
   }))
   .apply(Window.into(FixedWindows.of(Duration.standardMinutes(1
   .apply(Sample.any(3));
   p.run();
 }
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-27 Thread GitBox
jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes 
from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347341595
 
 
   That would be optimal, and in general I really like the idea of capturing 
the stack trace of transform application and using it in all error messages. We 
already do this in PAssert, but it can be applied much more widely - e.g. 
errors during coder inference, validation errors, some runtime errors etc.
   
   I suspect that implementing that can have some unexpected stumbling blocks, 
so I would recommend to start with a simpler improvement - just make the error 
message clearly say that the fix is to specify name in apply().
   
   However, if you're willing to drive an effort to use application stack 
traces in more places in Beam, starting with a discussion on the mailing list - 
that would be a wonderful contribution and I'd be happy to review it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any 
performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153335970
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   This seems like (another) bug in Sample.any. It should use 
Combine.globally().withoutDefaults() rather than simply Combine.globally().


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4139: Update cloud spanner library to 0.29.0

2017-11-27 Thread GitBox
jkff commented on issue #4139: Update cloud spanner library to 0.29.0
URL: https://github.com/apache/beam/pull/4139#issuecomment-347342299
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4139: Update cloud spanner library to 0.29.0

2017-11-27 Thread GitBox
jkff commented on issue #4139: Update cloud spanner library to 0.29.0
URL: https://github.com/apache/beam/pull/4139#issuecomment-347342562
 
 
   Run Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   3   >