date:20171124

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-24 Thread Reuven Lax

I am not an owner or maintainer of the PyPi package, so I'm not sure if
I'll be able to release Python artifacts.

On Fri, Nov 24, 2017 at 10:08 PM, Jean-Baptiste Onofré 
wrote:

> Awesome, Thanks Reuven !
>
> Regards
> JB
>
> On 11/25/2017 07:01 AM, Reuven Lax wrote:
>
>> I'll go ahead and send the RESULT email right now.
>>
>>
>>
>> On Fri, Nov 24, 2017 at 9:56 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>> It's not sync: promoting the artifacts takes some time (at least 30
>>> minutes). So, the artifacts will be on central after a certain time.
>>>
>>> I confirm that it's OK because now the artifacts are on Central:
>>>
>>> http://repo.maven.apache.org/maven2/org/apache/beam/beam-sdk
>>> s-java-core/2.2.0/
>>>
>>> By the way, you are promoting the artifacts to Central but I didn't see
>>> any [RESULT] e-mail on the vote thread. You have first to close the vote,
>>> then promote the artifacts, announce the release, etc.
>>>
>>> Regards
>>> JB
>>>
>>> On 11/25/2017 12:43 AM, Reuven Lax wrote:
>>>
>>> Appears to be a problem :)

 I tried publishing the latest artifact from Apache Nexus to Maven
 Central.
 After clicking publish, Nexus claimed that the operation has completed.
 However a look at the Maven Central page (
 https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-core)
 does not show 2.2.0 artifacts, and the staging repository has now
 vanished
 from the Nexus site! Does anyone know what happened here?

 Reuven

 On Wed, Nov 22, 2017 at 11:04 PM, Thomas Weise  wrote:

 +1

>
> Run quickstart with Apex runner in embedded mode and on YARN.
>
> It needed couple tweaks to get there though.
>
> 1) Change quickstart pom.xml apex-runner profile:
>
>   
>   
> org.apache.hadoop
> hadoop-yarn-client
> ${hadoop.version}
> runtime
>   
>   
> org.apache.hadoop
> hadoop-common
> ${hadoop.version}
> runtime
>   
>
> 2) After copying the fat jar to the cluster:
>
> java -cp word-count-beam-bundled-0.1.jar org.apache.beam.examples.
> WordCount
> \
>--inputFile=file:///tmp/input.txt --output=/tmp/counts
> --embeddedExecution=false --configFile=beam-runners-apex.properties
> --runner=ApexRunner
>
> (this was on a single node cluster, hence the local file path)
>
> The quickstart instructions suggest to use *mvn exec:java* instead of
> *java*
> - it generally isn't valid to assume that mvn and a build environment
> exists on the edge node of a YARN cluster.
>
>
>
> On Wed, Nov 22, 2017 at 2:12 PM, Nishu  wrote:
>
> Hi Eugene,
>
>>
>> I ran it on both  standalone flink(non Yarn) and  Flink on HDInsight
>> Cluster(Yarn). Both ran successfully. :)
>>
>> Regards,
>> Nishu
>>
>> > source=link_campaign=sig-email_content=webmail_term=icon>
>> Virus-free.
>> www.avast.com
>> > source=link_campaign=sig-email_content=webmail_term=link>
>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> On Wed, Nov 22, 2017 at 9:40 PM, Eugene Kirpichov <
>> kirpic...@google.com.invalid> wrote:
>>
>> Thanks Nishu. So, if I understand correctly, your pipelines were
>>
>>>
>>> running
>>
>
> on
>>
>> non-YARN, but you're planning to run with YARN?
>>>
>>> I meanwhile was able to get Flink running on Dataproc (YARN), and
>>>
>>> validated
>>
>> quickstart and game examples.
>>> At this point we need validation for Spark and Flink non-YARN [I
>>> think
>>>
>>> if
>>
>
> Nishu's runs were non-YARN, they'd give us enough confidence, combined
>>
>>>
>>> with
>>
>> the success of other validations of Spark and Flink runners?], and
>>> Apex
>>>
>>> on
>>
>> YARN. However, it seems that in previous RCs we were not validating
>>>
>>> Apex
>>
>
> on
>>
>> YARN, only local cluster. Is it needed this time?
>>>
>>> On Wed, Nov 22, 2017 at 12:28 PM Nishu  wrote:
>>>
>>> Hi Eugene,
>>>

 No, I didn't try with those instead I have my custom pipeline where

 Kafka
>>>
>>
>> topic is the source. I have defined a Global Window and processing
>>>

 time
>>>
>>
> trigger to read the data. Further it runs some transformation i.e.
>>
>>> GroupByKey and CoGroupByKey. on the windowed collections.
 I was running the same pipeline on direct runner and spark

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-24 Thread Jean-Baptiste Onofré


Awesome, Thanks Reuven !

Regards
JB

On 11/25/2017 07:01 AM, Reuven Lax wrote:

I'll go ahead and send the RESULT email right now.



On Fri, Nov 24, 2017 at 9:56 PM, Jean-Baptiste Onofré 
wrote:


It's not sync: promoting the artifacts takes some time (at least 30
minutes). So, the artifacts will be on central after a certain time.

I confirm that it's OK because now the artifacts are on Central:

http://repo.maven.apache.org/maven2/org/apache/beam/beam-sdk
s-java-core/2.2.0/

By the way, you are promoting the artifacts to Central but I didn't see
any [RESULT] e-mail on the vote thread. You have first to close the vote,
then promote the artifacts, announce the release, etc.

Regards
JB

On 11/25/2017 12:43 AM, Reuven Lax wrote:


Appears to be a problem :)

I tried publishing the latest artifact from Apache Nexus to Maven Central.
After clicking publish, Nexus claimed that the operation has completed.
However a look at the Maven Central page (
https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-core)
does not show 2.2.0 artifacts, and the staging repository has now vanished
from the Nexus site! Does anyone know what happened here?

Reuven

On Wed, Nov 22, 2017 at 11:04 PM, Thomas Weise  wrote:

+1


Run quickstart with Apex runner in embedded mode and on YARN.

It needed couple tweaks to get there though.

1) Change quickstart pom.xml apex-runner profile:

  
  
org.apache.hadoop
hadoop-yarn-client
${hadoop.version}
runtime
  
  
org.apache.hadoop
hadoop-common
${hadoop.version}
runtime
  

2) After copying the fat jar to the cluster:

java -cp word-count-beam-bundled-0.1.jar org.apache.beam.examples.
WordCount
\
   --inputFile=file:///tmp/input.txt --output=/tmp/counts
--embeddedExecution=false --configFile=beam-runners-apex.properties
--runner=ApexRunner

(this was on a single node cluster, hence the local file path)

The quickstart instructions suggest to use *mvn exec:java* instead of
*java*
- it generally isn't valid to assume that mvn and a build environment
exists on the edge node of a YARN cluster.



On Wed, Nov 22, 2017 at 2:12 PM, Nishu  wrote:

Hi Eugene,


I ran it on both  standalone flink(non Yarn) and  Flink on HDInsight
Cluster(Yarn). Both ran successfully. :)

Regards,
Nishu


Virus-free.
www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, Nov 22, 2017 at 9:40 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

Thanks Nishu. So, if I understand correctly, your pipelines were



running



on


non-YARN, but you're planning to run with YARN?

I meanwhile was able to get Flink running on Dataproc (YARN), and


validated


quickstart and game examples.
At this point we need validation for Spark and Flink non-YARN [I think


if



Nishu's runs were non-YARN, they'd give us enough confidence, combined



with


the success of other validations of Spark and Flink runners?], and Apex


on


YARN. However, it seems that in previous RCs we were not validating


Apex



on


YARN, only local cluster. Is it needed this time?

On Wed, Nov 22, 2017 at 12:28 PM Nishu  wrote:

Hi Eugene,


No, I didn't try with those instead I have my custom pipeline where


Kafka



topic is the source. I have defined a Global Window and processing



time



trigger to read the data. Further it runs some transformation i.e.

GroupByKey and CoGroupByKey. on the windowed collections.
I was running the same pipeline on direct runner and spark runner


earlier..


Today gave it a try with Flink on Yarn.

Best Regards,
Nishu.

<
https://www.avast.com/sig-email?utm_medium=email_


source=link_campaign=sig-email_content=webmail_term=icon




Virus-free.

www.avast.com
<
https://www.avast.com/sig-email?utm_medium=email_


source=link_campaign=sig-email_content=webmail_term=link




<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


On Wed, Nov 22, 2017 at 8:07 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

Thanks Nishu! Can you clarify which pipeline you were running?

The validation spreadsheet includes 1) the quickstart and 2) mobile


game



walkthroughs. Was it one of these, or your custom pipeline?


On Wed, Nov 22, 2017 at 10:20 AM Nishu 


wrote:





Hi,


Typo in previous mail.  I meant Flink runner.

Thanks,
Nishu
On Wed, 22 Nov 2017 at 19.17,

Hi,


I build a pipeline using RC 2.2 today and ran with runner on


yarn.



It worked seamlessly for unbounded sources. Couldn’t see any



issues



with



my pipeline so far :)



Thanks,Nishu

On Wed, 22 Nov 2017 at 18.57, Reuven Lax

[RESULT] [VOTE] Release 2.2.0, release candidate #4

2017-11-24 Thread Reuven Lax

I'm happy to announce that we have unanimously approved this release.

There are 9 approving votes, 5 of which are binding:
* Lukasz Cwik (binding)
* Romain Manni-Bucau (non binding)
* Jean-Baptiste Onofré (binding)
* Ahmet Altay (binding)
* Robert Bradshaw (binding)
* Konstantinos Katsiapis (non binding)
* Max Barrios (non binding)
* Kenneth Knowles (binding)
* Thomas Weise (non binding)

There are no disapproving votes.

Thanks everyone!

Reuven

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-24 Thread Reuven Lax

I'll go ahead and send the RESULT email right now.



On Fri, Nov 24, 2017 at 9:56 PM, Jean-Baptiste Onofré 
wrote:

> It's not sync: promoting the artifacts takes some time (at least 30
> minutes). So, the artifacts will be on central after a certain time.
>
> I confirm that it's OK because now the artifacts are on Central:
>
> http://repo.maven.apache.org/maven2/org/apache/beam/beam-sdk
> s-java-core/2.2.0/
>
> By the way, you are promoting the artifacts to Central but I didn't see
> any [RESULT] e-mail on the vote thread. You have first to close the vote,
> then promote the artifacts, announce the release, etc.
>
> Regards
> JB
>
> On 11/25/2017 12:43 AM, Reuven Lax wrote:
>
>> Appears to be a problem :)
>>
>> I tried publishing the latest artifact from Apache Nexus to Maven Central.
>> After clicking publish, Nexus claimed that the operation has completed.
>> However a look at the Maven Central page (
>> https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-core)
>> does not show 2.2.0 artifacts, and the staging repository has now vanished
>> from the Nexus site! Does anyone know what happened here?
>>
>> Reuven
>>
>> On Wed, Nov 22, 2017 at 11:04 PM, Thomas Weise  wrote:
>>
>> +1
>>>
>>> Run quickstart with Apex runner in embedded mode and on YARN.
>>>
>>> It needed couple tweaks to get there though.
>>>
>>> 1) Change quickstart pom.xml apex-runner profile:
>>>
>>>  
>>>  
>>>org.apache.hadoop
>>>hadoop-yarn-client
>>>${hadoop.version}
>>>runtime
>>>  
>>>  
>>>org.apache.hadoop
>>>hadoop-common
>>>${hadoop.version}
>>>runtime
>>>  
>>>
>>> 2) After copying the fat jar to the cluster:
>>>
>>> java -cp word-count-beam-bundled-0.1.jar org.apache.beam.examples.
>>> WordCount
>>> \
>>>   --inputFile=file:///tmp/input.txt --output=/tmp/counts
>>> --embeddedExecution=false --configFile=beam-runners-apex.properties
>>> --runner=ApexRunner
>>>
>>> (this was on a single node cluster, hence the local file path)
>>>
>>> The quickstart instructions suggest to use *mvn exec:java* instead of
>>> *java*
>>> - it generally isn't valid to assume that mvn and a build environment
>>> exists on the edge node of a YARN cluster.
>>>
>>>
>>>
>>> On Wed, Nov 22, 2017 at 2:12 PM, Nishu  wrote:
>>>
>>> Hi Eugene,

 I ran it on both  standalone flink(non Yarn) and  Flink on HDInsight
 Cluster(Yarn). Both ran successfully. :)

 Regards,
 Nishu

 
 Virus-free.
 www.avast.com
 
 <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

 On Wed, Nov 22, 2017 at 9:40 PM, Eugene Kirpichov <
 kirpic...@google.com.invalid> wrote:

 Thanks Nishu. So, if I understand correctly, your pipelines were
>
 running
>>>
 on

> non-YARN, but you're planning to run with YARN?
>
> I meanwhile was able to get Flink running on Dataproc (YARN), and
>
 validated

> quickstart and game examples.
> At this point we need validation for Spark and Flink non-YARN [I think
>
 if
>>>
 Nishu's runs were non-YARN, they'd give us enough confidence, combined
>
 with

> the success of other validations of Spark and Flink runners?], and Apex
>
 on

> YARN. However, it seems that in previous RCs we were not validating
>
 Apex
>>>
 on

> YARN, only local cluster. Is it needed this time?
>
> On Wed, Nov 22, 2017 at 12:28 PM Nishu  wrote:
>
> Hi Eugene,
>>
>> No, I didn't try with those instead I have my custom pipeline where
>>
> Kafka

> topic is the source. I have defined a Global Window and processing
>>
> time
>>>
 trigger to read the data. Further it runs some transformation i.e.
>> GroupByKey and CoGroupByKey. on the windowed collections.
>> I was running the same pipeline on direct runner and spark runner
>>
> earlier..
>
>> Today gave it a try with Flink on Yarn.
>>
>> Best Regards,
>> Nishu.
>>
>> <
>> https://www.avast.com/sig-email?utm_medium=email_
>>
> source=link_campaign=sig-email_content=webmail_term=icon
>
>>
>>> Virus-free.
>> www.avast.com
>> <
>> https://www.avast.com/sig-email?utm_medium=email_
>>
> source=link_campaign=sig-email_content=webmail_term=link
>
>>
>>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> On Wed, Nov 22, 2017 at 8:07 PM, Eugene Kirpichov <
>> kirpic...@google.com.invalid> wrote:
>>
>> Thanks Nishu! Can you clarify which pipeline you were running?

[Build] Enforce failed on project beam-sdks-java-io-tika

2017-11-24 Thread Manu Zhang

Hi all,

Has anyone seen this issue when building latest master on Mac ?

*[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce (enforce)
on project beam-sdks-java-io-tika: Execution enforce of goal
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce failed.
NullPointerException -> [Help 1]*

Thanks,
Manu

RE: Azure(ADLS) compatibility on Beam with Spark runner

2017-11-24 Thread Milan Chandna

Hi JB,

Thanks for the updates.
BTW I am myself in Microsoft but I am trying this out of my interest.
And it's good to know that someone else is also working on this.

-Milan.

-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] 
Sent: Thursday, November 23, 2017 1:47 PM
To: dev@beam.apache.org
Subject: Re: Azure(ADLS) compatibility on Beam with Spark runner

The Azure guys tried to use ADLS via Beam HDFS filesystem, but it seems they 
didn't succeed.
The new approach we plan is to directly use the ADLS API.

I keep you posted.

Regards
JB

On 11/23/2017 07:42 AM, Milan Chandna wrote:
> I tried both the ways.
> Passed ADL specific configuration in --hdfsConfiguration as well and have 
> setup the core-site.xml/hdfs-site.xml as well.
> As I mentioned it's a HDI + Spark cluster, those things are already setup.
> Spark job(without Beam) is also able to read and write to ADLS on same 
> machine.
> 
> BTW if the authentication or understanding ADL was a problem, it would have 
> thrown error like ADLFileSystem missing or probably access failed or 
> something. Thoughts?
> 
> -Milan.
> 
> -Original Message-
> From: Lukasz Cwik [mailto:lc...@google.com.INVALID]
> Sent: Thursday, November 23, 2017 5:05 AM
> To: dev@beam.apache.org
> Subject: Re: Azure(ADLS) compatibility on Beam with Spark runner
> 
> In your example it seems as though your HDFS configuration doesn't contain 
> any ADL specific configuration:  "--hdfsConfiguration='[{\"fs.defaultFS\":
> \"hdfs://home/sample.txt\"]'"
> Do you have a core-site.xml or hdfs-site.xml configured as per:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fcurrent%2Fhadoop-azure-datalake%2Findex.html=02%7C01%7CMilan.Chandna%40microsoft.com%7Cb7dffcc26bfe44df589a08d53201aeab%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636469905161638292=Z%2FNJPDOZf5Xn6g9mVDfYdGiQKBPLJ1Gft8eka5W7Yts%3D=0?
> 
>  From the documentation for --hdfsConfiguration:
> A list of Hadoop configurations used to configure zero or more Hadoop 
> filesystems. By default, Hadoop configuration is loaded from 'core-site.xml' 
> and 'hdfs-site.xml based upon the HADOOP_CONF_DIR and YARN_CONF_DIR 
> environment variables. To specify configuration on the command-line, 
> represent the value as a JSON list of JSON maps, where each map represents 
> the entire configuration for a single Hadoop filesystem. For example 
> --hdfsConfiguration='[{\"fs.default.name\":
> \"hdfs://localhost:9998\", ...},{\"fs.default.name\": \"s3a://\", ...},...]'
> From:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> b.com%2Fapache%2Fbeam%2Fblob%2F9f81fd299bd32e0d6056a7da9fa994cf74db0ed
> 9%2Fsdks%2Fjava%2Fio%2Fhadoop-file-system%2Fsrc%2Fmain%2Fjava%2Forg%2F
> apache%2Fbeam%2Fsdk%2Fio%2Fhdfs%2FHadoopFileSystemOptions.java%23L45
> ata=02%7C01%7CMilan.Chandna%40microsoft.com%7Cb7dffcc26bfe44df589a08d5
> 3201aeab%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6364699051616382
> 92=tL3UzNW4OBuFa1LMIzZsyR8eSqBoZ7hWVJipnznrQ5Q%3D=0
> 
> On Wed, Nov 22, 2017 at 1:12 AM, Jean-Baptiste Onofré 
> 
> wrote:
> 
>> Hi,
>>
>> FYI, I'm in touch with Microsoft Azure team about that.
>>
>> We are testing the ADLS support via HDFS.
>>
>> I keep you posted.
>>
>> Regards
>> JB
>>
>> On 11/22/2017 09:12 AM, Milan Chandna wrote:
>>
>>> Hi,
>>>
>>> Has anyone tried IO from(to) ADLS account on Beam with Spark runner?
>>> I was trying recently to do this but was unable to make it work.
>>>
>>> Steps that I tried:
>>>
>>> 1.  Took HDI + Spark 1.6 cluster with default storage as ADLS account.
>>> 2.  Built Apache Beam on that. Built to include Beam-2790< 
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiss
>>> u 
>>> es.apache.org%2Fjira%2Fbrowse%2FBEAM-2790=02%7C01%7CMilan.Chandna%40microsoft.com%7Cb7dffcc26bfe44df589a08d53201aeab%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636469905161638292=aj%2FlaXlhlOQtnlRqHh8yLs2KfOZuRwDUUFvTpLB3Atg%3D=0>
>>>  fix which earlier I was facing for ADL as well.
>>> 3.  Modified WordCount.java example to use HadoopFileSystemOptions
>>> 4.  Since HDI + Spark cluster has ADLS as defaultFS, tried 2 things
>>>*   Just gave the input path and output path as
>>> adl://home/sample.txt and adl://home/output
>>>*   In addition to adl input and output path, also gave required
>>> HDFS configuration with adl required configs as well.
>>>
>>> Both didn't worked btw.
>>> s
>>> 1.  Have checked ACL's and permissions. In fact similar job with 
>>> same paths work on Spark directly.
>>> 2.  Issues faced:
>>>*   For input, Beam is not able to find the path. Console log:
>>> Filepattern adl://home/sample.txt matched 0 files with total size 0
>>>*   Output path always gets converted to relative path, something
>>> like this: /home/user1/adl:/home/output/.tmp
>>>
>>>
>>>
>>>
>>>
>>> Debugging more into this but was checking if someone is

[GitHub] reuvenlax commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-24 Thread GitBox

reuvenlax commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r152652760
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -824,177 +826,78 @@ public void startBundle() {
 public void processElement(ProcessContext c) {
   fileResults.add(c.element());
   if (fixedNumShards == null) {
-if (numShardsView != null) {
-  fixedNumShards = c.sideInput(numShardsView);
-} else if (numShardsProvider != null) {
-  fixedNumShards = numShardsProvider.get();
-} else {
-  throw new IllegalStateException(
-  "When finalizing a windowed write, should have set fixed 
sharding");
-}
+fixedNumShards = getFixedNumShards.apply(c);
+checkState(fixedNumShards != null, "Windowed write should have set 
fixed sharding");
 
 Review comment:
   Windowed (non triggered) writes in batch do not need fixed sharding


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] reuvenlax commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-24 Thread GitBox

reuvenlax commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153034261
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -339,50 +297,189 @@ public boolean isWindowedWrites() {
 sink, computeNumShards, numShardsProvider, true, 
maxNumWritersPerBundle, sideInputs);
   }
 
-  private static class WriterKey {
-private final BoundedWindow window;
-private final PaneInfo paneInfo;
-private final DestinationT destination;
+  @Override
+  public void validate(PipelineOptions options) {
+sink.validate(options);
+  }
 
-WriterKey(BoundedWindow window, PaneInfo paneInfo, DestinationT 
destination) {
-  this.window = window;
-  this.paneInfo = paneInfo;
-  this.destination = destination;
+  @Override
+  public WriteFilesResult expand(PCollection input) {
+if (input.isBounded() == IsBounded.UNBOUNDED) {
+  checkArgument(
+  windowedWrites,
+  "Must use windowed writes when applying %s to an unbounded 
PCollection",
+  WriteFiles.class.getSimpleName());
+}
+if (windowedWrites) {
+  // The reason for this is https://issues.apache.org/jira/browse/BEAM-1438
+  // and similar behavior in other runners.
+  checkArgument(
+  computeNumShards != null || numShardsProvider != null,
+  "When using windowed writes, must specify number of output shards 
explicitly",
+  WriteFiles.class.getSimpleName());
 }
+this.writeOperation = sink.createWriteOperation();
+this.writeOperation.setWindowedWrites(windowedWrites);
 
-@Override
-public boolean equals(Object o) {
-  if (!(o instanceof WriterKey)) {
-return false;
-  }
-  WriterKey other = (WriterKey) o;
-  return Objects.equal(window, other.window)
-  && Objects.equal(paneInfo, other.paneInfo)
-  && Objects.equal(destination, other.destination);
+if (!windowedWrites) {
+  // Re-window the data into the global window and remove any existing 
triggers.
+  input =
+  input.apply(
+  "RewindowIntoGlobal",
+  Window.into(new GlobalWindows())
+  .triggering(DefaultTrigger.of())
+  .discardingFiredPanes());
+}
+
+Coder destinationCoder;
+try {
+  destinationCoder =
+  getDynamicDestinations()
+  
.getDestinationCoderWithDefault(input.getPipeline().getCoderRegistry());
+  destinationCoder.verifyDeterministic();
+} catch (CannotProvideCoderException | NonDeterministicException e) {
+  throw new RuntimeException(e);
+}
+@SuppressWarnings("unchecked")
+Coder windowCoder =
+(Coder) 
input.getWindowingStrategy().getWindowFn().windowCoder();
+FileResultCoder fileResultCoder =
+FileResultCoder.of(windowCoder, destinationCoder);
+
+PCollectionView numShardsView =
+(computeNumShards == null) ? null : input.apply(computeNumShards);
+
+PCollection tempFileResults =
+(computeNumShards == null && numShardsProvider == null)
+? input.apply(
+"WriteUnshardedBundlesToTempFiles",
 
 Review comment:
   Unfortunately, refactoring into new PTransforms changes the name of every 
single sub step (since step names are hierarchical).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-24 Thread Reuven Lax

Appears to be a problem :)

I tried publishing the latest artifact from Apache Nexus to Maven Central.
After clicking publish, Nexus claimed that the operation has completed.
However a look at the Maven Central page (
https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-core)
does not show 2.2.0 artifacts, and the staging repository has now vanished
from the Nexus site! Does anyone know what happened here?

Reuven

On Wed, Nov 22, 2017 at 11:04 PM, Thomas Weise  wrote:

> +1
>
> Run quickstart with Apex runner in embedded mode and on YARN.
>
> It needed couple tweaks to get there though.
>
> 1) Change quickstart pom.xml apex-runner profile:
>
> 
> 
>   org.apache.hadoop
>   hadoop-yarn-client
>   ${hadoop.version}
>   runtime
> 
> 
>   org.apache.hadoop
>   hadoop-common
>   ${hadoop.version}
>   runtime
> 
>
> 2) After copying the fat jar to the cluster:
>
> java -cp word-count-beam-bundled-0.1.jar org.apache.beam.examples.
> WordCount
> \
>  --inputFile=file:///tmp/input.txt --output=/tmp/counts
> --embeddedExecution=false --configFile=beam-runners-apex.properties
> --runner=ApexRunner
>
> (this was on a single node cluster, hence the local file path)
>
> The quickstart instructions suggest to use *mvn exec:java* instead of
> *java*
> - it generally isn't valid to assume that mvn and a build environment
> exists on the edge node of a YARN cluster.
>
>
>
> On Wed, Nov 22, 2017 at 2:12 PM, Nishu  wrote:
>
> > Hi Eugene,
> >
> > I ran it on both  standalone flink(non Yarn) and  Flink on HDInsight
> > Cluster(Yarn). Both ran successfully. :)
> >
> > Regards,
> > Nishu
> >
> >  > source=link_campaign=sig-email_content=webmail_term=icon>
> > Virus-free.
> > www.avast.com
> >  > source=link_campaign=sig-email_content=webmail_term=link>
> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
> > On Wed, Nov 22, 2017 at 9:40 PM, Eugene Kirpichov <
> > kirpic...@google.com.invalid> wrote:
> >
> > > Thanks Nishu. So, if I understand correctly, your pipelines were
> running
> > on
> > > non-YARN, but you're planning to run with YARN?
> > >
> > > I meanwhile was able to get Flink running on Dataproc (YARN), and
> > validated
> > > quickstart and game examples.
> > > At this point we need validation for Spark and Flink non-YARN [I think
> if
> > > Nishu's runs were non-YARN, they'd give us enough confidence, combined
> > with
> > > the success of other validations of Spark and Flink runners?], and Apex
> > on
> > > YARN. However, it seems that in previous RCs we were not validating
> Apex
> > on
> > > YARN, only local cluster. Is it needed this time?
> > >
> > > On Wed, Nov 22, 2017 at 12:28 PM Nishu  wrote:
> > >
> > > > Hi Eugene,
> > > >
> > > > No, I didn't try with those instead I have my custom pipeline where
> > Kafka
> > > > topic is the source. I have defined a Global Window and processing
> time
> > > > trigger to read the data. Further it runs some transformation i.e.
> > > > GroupByKey and CoGroupByKey. on the windowed collections.
> > > > I was running the same pipeline on direct runner and spark runner
> > > earlier..
> > > > Today gave it a try with Flink on Yarn.
> > > >
> > > > Best Regards,
> > > > Nishu.
> > > >
> > > > <
> > > > https://www.avast.com/sig-email?utm_medium=email_
> > > source=link_campaign=sig-email_content=webmail_term=icon
> > > > >
> > > > Virus-free.
> > > > www.avast.com
> > > > <
> > > > https://www.avast.com/sig-email?utm_medium=email_
> > > source=link_campaign=sig-email_content=webmail_term=link
> > > > >
> > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > > >
> > > > On Wed, Nov 22, 2017 at 8:07 PM, Eugene Kirpichov <
> > > > kirpic...@google.com.invalid> wrote:
> > > >
> > > > > Thanks Nishu! Can you clarify which pipeline you were running?
> > > > > The validation spreadsheet includes 1) the quickstart and 2) mobile
> > > game
> > > > > walkthroughs. Was it one of these, or your custom pipeline?
> > > > >
> > > > > On Wed, Nov 22, 2017 at 10:20 AM Nishu 
> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Typo in previous mail.  I meant Flink runner.
> > > > > >
> > > > > > Thanks,
> > > > > > Nishu
> > > > > > On Wed, 22 Nov 2017 at 19.17,
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I build a pipeline using RC 2.2 today and ran with runner on
> > yarn.
> > > > > > > It worked seamlessly for unbounded sources. Couldn’t see any
> > issues
> > > > > with
> > > > > > > my pipeline so far :)
> > > > > > >
> > > > > > >
> > > > > > > Thanks,Nishu
> > > > > > >
> > > > > > > On Wed, 22 Nov 2017 at 18.57, Reuven Lax
> >  > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> Who is validating Flink and Yarn?
> > > > > > >>
> > > > > > >>

[GitHub] iemejia commented on a change in pull request #4174: [BEAM-3244] Ensure execution of teardown method on Flink's DoFnOperator

2017-11-24 Thread GitBox

iemejia commented on a change in pull request #4174: [BEAM-3244] Ensure 
execution of teardown method on Flink's DoFnOperator
URL: https://github.com/apache/beam/pull/4174#discussion_r153012280
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
 ##
 @@ -380,7 +386,6 @@ public void close() throws Exception {
   }
 }
 checkFinishBundleTimer.cancel(true);
 
 Review comment:
   @aljoscha I had the doubt if this one should be moved to dispose too, given 
that close can eventually not be called. WDYT ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia opened a new pull request #4174: [BEAM-3244] Ensure execution of teardown method on Flink's DoFnOperator

2017-11-24 Thread GitBox

iemejia opened a new pull request #4174: [BEAM-3244] Ensure execution of 
teardown method on Flink's DoFnOperator
URL: https://github.com/apache/beam/pull/4174
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [x] Each commit in the pull request should have a meaningful subject line 
and body.
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872620
 
 
   Run Gearpump ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872611
 
 
   Run Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346871884
 
 
   Run Flink ValidatesRunner
   Run Spark ValidatesRunner
   Run Dataflow ValidatesRunner
   Run Apex ValidatesRunner
   Run Gearpump ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872593
 
 
   Run Apex ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872574
 
 
   Run Spark ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346703990
 
 
   Run Spark ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346704033
 
 
   Run Apex ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346704040
 
 
   Run Gearpump ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346703983
 
 
   Run Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346872487
 
 
   Run Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

2017-11-24 Thread GitBox

iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part 
of the ValidatesRunner set
URL: https://github.com/apache/beam/pull/4170#issuecomment-346871884
 
 
   Run Flink ValidatesRunner
   Run Spark ValidatesRunner
   Run Dataflow ValidatesRunner
   Run Apex ValidatesRunner
   Run Gearpump ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] xumingming commented on issue #4173: [DISCUSS] add a java profile to be able to skip python/go when not relevant for current work

2017-11-24 Thread GitBox

xumingming commented on issue #4173: [DISCUSS] add a java profile to be able to 
skip python/go when not relevant for current work
URL: https://github.com/apache/beam/pull/4173#issuecomment-346862839
 
 
   I like this idea.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-24 Thread GitBox

lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed 
TextIOIT
URL: https://github.com/apache/beam/pull/4149#discussion_r152999363
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -83,25 +90,82 @@ private static String appendTimestamp(String 
filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
-  @Test
-  public void writeThenReadAll() {
-PCollection testFilenames = pipeline
-.apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
-.apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
-.apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
-.getPerDestinationOutputFilenames().apply(Values.create());
+  /** IO IT with no compression. */
+  @RunWith(JUnit4.class)
+  public static class UncompressedTextIOIT {
+
+@Rule
+public TestPipeline pipeline = TestPipeline.create();
+
+@Test
+public void writeThenReadAll() {
+  PCollection testFilenames = pipeline
+  .apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
+  .apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
+  .apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
+  .getPerDestinationOutputFilenames().apply(Values.create());
+
+  PCollection consolidatedHashcode = testFilenames
+  .apply("Read all files", TextIO.readAll())
+  .apply("Calculate hashcode", Combine.globally(new HashingFn()));
+
+  String expectedHash = getExpectedHashForLineCount(numberOfTextLines);
+  PAssert.thatSingleton(consolidatedHashcode).isEqualTo(expectedHash);
+
+  testFilenames.apply("Delete test files", ParDo.of(new DeleteFileFn())
+  
.withSideInputs(consolidatedHashcode.apply(View.asSingleton(;
+
+  pipeline.run().waitUntilFinish();
+}
+  }
+
+  /** IO IT with various compression types. */
+  @RunWith(Parameterized.class)
+  public static class CompressedTextIOIT {
+
+@Rule
+public TestPipeline pipeline = TestPipeline.create();
+
+@Parameterized.Parameters()
+public static Iterable data() {
+  return ImmutableList.builder()
+  .add(GZIP)
+  .add(DEFLATE)
+  .add(BZIP2)
+  .build();
+}
+
+@Parameterized.Parameter()
+public Compression compression;
+
+@Test
+public void writeThenReadAllWithCompression() {
+  TextIO.TypedWrite write = TextIO
+  .write()
+  .to(filenamePrefix)
+  .withOutputFilenames()
+  .withCompression(compression);
+
+  TextIO.ReadAll read = TextIO.readAll().withCompression(AUTO);
 
-PCollection consolidatedHashcode = testFilenames
-.apply("Read all files", TextIO.readAll())
-.apply("Calculate hashcode", Combine.globally(new HashingFn()));
+  PCollection testFilenames = pipeline
 
 Review comment:
   I think it's hard to do right now without modifying perfkit's code. As we 
checked, perfkit ignores -D parameters because builds the mvn verify command by 
itself from the parameters passed . I think this could be done in some future 
contribution. We will file a bug report in perfkit soon. 
   
   I think the best solution (at least for now) is to leave the compression 
type in pipeline options. We pass them to perfkit either way (through 
`beam_it_options`) and, what imo is more important, compressionType is very 
test specific (same as numberOfRecords). WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-24 Thread GitBox

lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed 
TextIOIT
URL: https://github.com/apache/beam/pull/4149#discussion_r152998058
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -83,25 +90,82 @@ private static String appendTimestamp(String 
filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
-  @Test
-  public void writeThenReadAll() {
-PCollection testFilenames = pipeline
-.apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
-.apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
-.apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
-.getPerDestinationOutputFilenames().apply(Values.create());
+  /** IO IT with no compression. */
+  @RunWith(JUnit4.class)
+  public static class UncompressedTextIOIT {
 
 Review comment:
   Yes, it works but runs all the 4 tests that are there in the file. But now I 
think this is probably not what we want. This won't be a problem as you 
suggested an even better solution in the comment below. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Re: [RESULT][VOTE] Migrate to gitbox

2017-11-24 Thread Kenneth Knowles

+1 for new mailing list (reviews@)

On Fri, Nov 24, 2017 at 5:20 AM, James  wrote:

> +1 for new mailling list (reviews@)
>
> On Thu, Nov 23, 2017 at 7:38 PM Ismaël Mejía  wrote:
>
> > If github already does the notifications, I think that having an extra
> > notifications/reviews mailing list could be overkill (or spammy).
> > However I can see the value of this for archival reasons, e.g. to
> > store the history of the project comments out of github for the
> > future.
> >
> > +1 for new mailing list (reviews@) or disabled
> >
> > I don't think that putting this in commits is a good idea, The commits
> > mailing list already has a good amount of stuff goinig on. I think
> > that adding more granular information will make it harder to follow.
> >
> >
> > On Thu, Nov 23, 2017 at 12:17 PM, Jean-Baptiste Onofré 
> > wrote:
> > > Hi,
> > >
> > > following the migration to gitbox, we now have a notification e-mail
> (on
> > the
> > > dev mailing list) for each action on a PR (comments, closing, etc).
> > >
> > > It could be very verbose and I think we have to change that. For now, I
> > will
> > > ask to disable this notification.
> > >
> > > However, I think it's worth ask on the mailing list. Basically we have
> > the
> > > following options:
> > >
> > > - send the notification to commits@ mailing list
> > > - send the notification to a new mailing list (like review@ mailing
> > list)
> > > - leave the notification disabled
> > >
> > > Please, let me know what you prefer.
> > >
> > > Thanks
> > > Regards
> > > JB
> > >
> > >
> > > On 11/23/2017 11:19 AM, Jean-Baptiste Onofré wrote:
> > >>
> > >> The migration is done, you have to update your local copy with git
> > remote
> > >> set-url to use gitbox.apache.org instead of git-wip-us.apache.org.
> > >>
> > >> I'm checking the GitHub PRs (if we now have the merge button).
> > >>
> > >> Regards
> > >> JB
> > >>
> > >> On 11/23/2017 10:55 AM, Jean-Baptiste Onofré wrote:
> > >>>
> > >>> Hi guys,
> > >>>
> > >>> I just got an update from INFRA: the migration to gitbox starts now.
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>> On 11/07/2017 05:51 PM, Jean-Baptiste Onofré wrote:
> > 
> >  Hi guys,
> > 
> >  quick update on the gitbox migration.
> > 
> >  I created a Jira for INFRA:
> > 
> >  https://issues.apache.org/jira/browse/INFRA-15456
> > 
> >  It should be done pretty soon.
> > 
> >  Regards
> >  JB
> > 
> >  On 10/23/2017 07:24 AM, Jean-Baptiste Onofré wrote:
> > >
> > > Hi all,
> > >
> > > this vote passed with only +1.
> > >
> > > I will requuest INFRA to move the repositories to gitbox.
> > >
> > > Thanks all for your vote !
> > >
> > > Regards
> > > JB
> > >
> > > On 10/10/2017 09:42 AM, Jean-Baptiste Onofré wrote:
> > >>
> > >> Hi all,
> > >>
> > >> following the discussion, here's the formal vote to migrate to
> > gitbox:
> > >>
> > >> [ ] +1, Approve to migrate to gitbox
> > >> [ ] -1, Do not migrate (please provide specific comments)
> > >>
> > >> The vote will be open for at least 36 hours. It is adopted by
> > majority
> > >> approval, with at least 3 PMC affirmative votes.
> > >>
> > >> Thanks,
> > >> Regards
> > >> JB
> > >
> > >
> > 
> > >>>
> > >>
> > >
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> >
>

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-24 Thread Aljoscha Krettek

+1

> On 23. Nov 2017, at 23:22, Manu Zhang  wrote:
> 
> +1
> 
> On Thu, Nov 23, 2017 at 11:32 PM Maximilian Michels  wrote:
> 
>> +1
>> 
>> Thanks for looking into it!
>> 
>> On 23.11.17 00:25, Lukasz Cwik wrote:
>>> I have noticed that some e-mail addresses (notably @google.com) get
>>> .INVALID suffixed onto it so per...@yyy.com become
>> per...@yyy.com.INVALID
>>> in the From: header.
>>> 
>>> I have figured out that this is an issue with the way that our mail
>> server
>>> is configured and opened
>> https://issues.apache.org/jira/browse/INFRA-15529.
>>> 
>>> For those of us that are impacted, it makes it more difficult for users
>> to
>>> reply directly to the originator.
>>> 
>>> Infra has asked to get consensus from PMC members before making the
>> change
>>> which I figured it would be easiest with a vote.
>>> 
>>> Please vote:
>>> +1 Update mail server to stop suffixing .INVALID
>>> -1 Don't change mail server settings.
>>> 
>>

[GitHub] sduskis commented on a change in pull request #4171: [BEAM-3008] Extends API for BigtableIO Read and Write by adding withInstanceId and withProjectId

2017-11-24 Thread GitBox

sduskis commented on a change in pull request #4171: [BEAM-3008] Extends API 
for BigtableIO Read and Write by adding withInstanceId  and withProjectId 
URL: https://github.com/apache/beam/pull/4171#discussion_r152978149
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 ##
 @@ -78,38 +78,38 @@
  * The Bigtable source returns a set of rows from a single table, returning 
a
  * {@code PCollection}.
  *
- * To configure a Cloud Bigtable source, you must supply a table id and a 
{@link BigtableOptions}
- * or builder configured with the project and other information necessary to 
identify the
- * Bigtable instance. By default, {@link BigtableIO.Read} will read all rows 
in the table. The row
- * range to be read can optionally be restricted using {@link 
BigtableIO.Read#withKeyRange}, and
- * a {@link RowFilter} can be specified using {@link 
BigtableIO.Read#withRowFilter}. For example:
+ * To configure a Cloud Bigtable source, you must supply a table id, a 
project id, an instance
+ * id and optionally a {@link BigtableOptions} to provide more specific 
connection configuration.
+ * By default, {@link BigtableIO.Read} will read all rows in the table. The 
row range to be read
+ * can optionally be restricted using {@link BigtableIO.Read#withKeyRange}, 
and a {@link RowFilter}
+ * can be specified using {@link BigtableIO.Read#withRowFilter}. For example:
  *
  * {@code
- * BigtableOptions.Builder optionsBuilder =
- * new BigtableOptions.Builder()
- * .setProjectId("project")
- * .setInstanceId("instance");
  *
  * Pipeline p = ...;
  *
  * // Scan the entire table.
  * p.apply("read",
  * BigtableIO.read()
  * .withBigtableOptions(optionsBuilder)
 
 Review comment:
   Can you please remove `.withBigtableOptions(optionsBuilder)` for this 
example?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] xumingming commented on issue #4168: [BEAM-3238][SQL] Add BeamRecordSqlTypeBuilder

2017-11-24 Thread GitBox

xumingming commented on issue #4168: [BEAM-3238][SQL] Add 
BeamRecordSqlTypeBuilder
URL: https://github.com/apache/beam/pull/4168#issuecomment-346831152
 
 
   I like this idea!
   
   One minor comment: Can we put the `BeamRecordSqlTypeBuilder` inside 
`BeamRecordSqlType`? it will keep the surface api of 
`org.apache.beam.sdk.extensions.sql` cleaner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Re: [RESULT][VOTE] Migrate to gitbox

2017-11-24 Thread James

+1 for new mailling list (reviews@)

On Thu, Nov 23, 2017 at 7:38 PM Ismaël Mejía  wrote:

> If github already does the notifications, I think that having an extra
> notifications/reviews mailing list could be overkill (or spammy).
> However I can see the value of this for archival reasons, e.g. to
> store the history of the project comments out of github for the
> future.
>
> +1 for new mailing list (reviews@) or disabled
>
> I don't think that putting this in commits is a good idea, The commits
> mailing list already has a good amount of stuff goinig on. I think
> that adding more granular information will make it harder to follow.
>
>
> On Thu, Nov 23, 2017 at 12:17 PM, Jean-Baptiste Onofré 
> wrote:
> > Hi,
> >
> > following the migration to gitbox, we now have a notification e-mail (on
> the
> > dev mailing list) for each action on a PR (comments, closing, etc).
> >
> > It could be very verbose and I think we have to change that. For now, I
> will
> > ask to disable this notification.
> >
> > However, I think it's worth ask on the mailing list. Basically we have
> the
> > following options:
> >
> > - send the notification to commits@ mailing list
> > - send the notification to a new mailing list (like review@ mailing
> list)
> > - leave the notification disabled
> >
> > Please, let me know what you prefer.
> >
> > Thanks
> > Regards
> > JB
> >
> >
> > On 11/23/2017 11:19 AM, Jean-Baptiste Onofré wrote:
> >>
> >> The migration is done, you have to update your local copy with git
> remote
> >> set-url to use gitbox.apache.org instead of git-wip-us.apache.org.
> >>
> >> I'm checking the GitHub PRs (if we now have the merge button).
> >>
> >> Regards
> >> JB
> >>
> >> On 11/23/2017 10:55 AM, Jean-Baptiste Onofré wrote:
> >>>
> >>> Hi guys,
> >>>
> >>> I just got an update from INFRA: the migration to gitbox starts now.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 11/07/2017 05:51 PM, Jean-Baptiste Onofré wrote:
> 
>  Hi guys,
> 
>  quick update on the gitbox migration.
> 
>  I created a Jira for INFRA:
> 
>  https://issues.apache.org/jira/browse/INFRA-15456
> 
>  It should be done pretty soon.
> 
>  Regards
>  JB
> 
>  On 10/23/2017 07:24 AM, Jean-Baptiste Onofré wrote:
> >
> > Hi all,
> >
> > this vote passed with only +1.
> >
> > I will requuest INFRA to move the repositories to gitbox.
> >
> > Thanks all for your vote !
> >
> > Regards
> > JB
> >
> > On 10/10/2017 09:42 AM, Jean-Baptiste Onofré wrote:
> >>
> >> Hi all,
> >>
> >> following the discussion, here's the formal vote to migrate to
> gitbox:
> >>
> >> [ ] +1, Approve to migrate to gitbox
> >> [ ] -1, Do not migrate (please provide specific comments)
> >>
> >> The vote will be open for at least 36 hours. It is adopted by
> majority
> >> approval, with at least 3 PMC affirmative votes.
> >>
> >> Thanks,
> >> Regards
> >> JB
> >
> >
> 
> >>>
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>

[GitHub] rmannibucau opened a new pull request #4173: [DISCUSS] add a java profile to be able to skip python/go when not relevant for current work

2017-11-24 Thread GitBox

rmannibucau opened a new pull request #4173: [DISCUSS] add a java profile to be 
able to skip python/go when not relevant for current work
URL: https://github.com/apache/beam/pull/4173
 
 
   Often working on a feature or even more on a fix you only care about a 
language - which is probably most of the time java?
   
   When building the project, the python execution time is very important (like 
half of it on my machine). However you are sure you didn't affect it since the 
code is quite parallel and almost unrelated in term of dependency.
   
   This PR adds a java profile which skips python/go sdk when building. It is 
designed to be activable through a property you can put in your settings.xml if 
you always only work with java part of beam.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Re: gradle dirty files blocking maven build

2017-11-24 Thread Jean-Baptiste Onofré


Hi Romain,

I guess they are not part of the repo (git clean -x -f -d removes it), correct ?

Let me try.

Thanks,
Regards
JB

On 11/24/2017 10:00 AM, Romain Manni-Bucau wrote:

Hi guys,

I don't really know if it comes from my gradle tests or the gradle
build itself but I realize this morning I had ".gogradle" files in
beam in a few places and when building with maven the resource plugin
directory scanner goes through these files and seems it loops and
makes the build very slow in the best case and just locked in the
worse one.

Just in case you observe it, "find . -name '.gogradle' | xargs rm -Rf"
solves it.

Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

gradle dirty files blocking maven build

2017-11-24 Thread Romain Manni-Bucau

Hi guys,

I don't really know if it comes from my gradle tests or the gradle
build itself but I realize this morning I had ".gogradle" files in
beam in a few places and when building with maven the resource plugin
directory scanner goes through these files and seems it loops and
makes the build very slow in the best case and just locked in the
worse one.

Just in case you observe it, "find . -name '.gogradle' | xargs rm -Rf"
solves it.

Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn

Re: [VOTE] Release 2.2.0, release candidate #4

Re: [VOTE] Release 2.2.0, release candidate #4

[RESULT] [VOTE] Release 2.2.0, release candidate #4

Re: [VOTE] Release 2.2.0, release candidate #4

[Build] Enforce failed on project beam-sdks-java-io-tika

RE: Azure(ADLS) compatibility on Beam with Spark runner

[GitHub] reuvenlax commented on a change in pull request #4145: Many simplifications to WriteFiles

[GitHub] reuvenlax commented on a change in pull request #4145: Many simplifications to WriteFiles

Re: [VOTE] Release 2.2.0, release candidate #4

[GitHub] iemejia commented on a change in pull request #4174: [BEAM-3244] Ensure execution of teardown method on Flink's DoFnOperator

[GitHub] iemejia opened a new pull request #4174: [BEAM-3244] Ensure execution of teardown method on Flink's DoFnOperator

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] iemejia commented on issue #4170: Make ParDoLifecycleTest exception tests part of the ValidatesRunner set

[GitHub] xumingming commented on issue #4173: [DISCUSS] add a java profile to be able to skip python/go when not relevant for current work

[GitHub] lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

[GitHub] lgajowy commented on a change in pull request #4149: [BEAM-3060] Add Compressed TextIOIT

Re: [RESULT][VOTE] Migrate to gitbox

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

[GitHub] sduskis commented on a change in pull request #4171: [BEAM-3008] Extends API for BigtableIO Read and Write by adding withInstanceId and withProjectId

[GitHub] xumingming commented on issue #4168: [BEAM-3238][SQL] Add BeamRecordSqlTypeBuilder

Re: [RESULT][VOTE] Migrate to gitbox

[GitHub] rmannibucau opened a new pull request #4173: [DISCUSS] add a java profile to be able to skip python/go when not relevant for current work

Re: gradle dirty files blocking maven build

gradle dirty files blocking maven build

33 matches

Site Navigation

Mail list logo

Footer information