Re: [VOTE] Release 2.2.0, release candidate #4

Jean-Baptiste Onofré Fri, 24 Nov 2017 22:09:06 -0800

Awesome, Thanks Reuven !

Regards
JB


On 11/25/2017 07:01 AM, Reuven Lax wrote:

I'll go ahead and send the RESULT email right now.



On Fri, Nov 24, 2017 at 9:56 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

It's not sync: promoting the artifacts takes some time (at least 30
minutes). So, the artifacts will be on central after a certain time.

I confirm that it's OK because now the artifacts are on Central:

http://repo.maven.apache.org/maven2/org/apache/beam/beam-sdk
s-java-core/2.2.0/

By the way, you are promoting the artifacts to Central but I didn't see
any [RESULT] e-mail on the vote thread. You have first to close the vote,
then promote the artifacts, announce the release, etc.

Regards
JB

On 11/25/2017 12:43 AM, Reuven Lax wrote:

Appears to be a problem :)

I tried publishing the latest artifact from Apache Nexus to Maven Central.
After clicking publish, Nexus claimed that the operation has completed.
However a look at the Maven Central page (
https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-core)
does not show 2.2.0 artifacts, and the staging repository has now vanished
from the Nexus site! Does anyone know what happened here?

Reuven

On Wed, Nov 22, 2017 at 11:04 PM, Thomas Weise <t...@apache.org> wrote:

+1


Run quickstart with Apex runner in embedded mode and on YARN.

It needed couple tweaks to get there though.

1) Change quickstart pom.xml apex-runner profile:

          <!--
            Apex 3.6 is built against YARN 2.6. For this fat jar, the
included version has to match what's on the cluster,
            hence we need to repeat the Apex Hadoop dependencies at the
required version here.
          -->
          <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-yarn-client</artifactId>
            <version>${hadoop.version}</version>
            <scope>runtime</scope>
          </dependency>
          <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
            <scope>runtime</scope>
          </dependency>

2) After copying the fat jar to the cluster:

java -cp word-count-beam-bundled-0.1.jar org.apache.beam.examples.
WordCount
\
       --inputFile=file:///tmp/input.txt --output=/tmp/counts
--embeddedExecution=false --configFile=beam-runners-apex.properties
--runner=ApexRunner

(this was on a single node cluster, hence the local file path)

The quickstart instructions suggest to use *mvn exec:java* instead of
*java*
- it generally isn't valid to assume that mvn and a build environment
exists on the edge node of a YARN cluster.



On Wed, Nov 22, 2017 at 2:12 PM, Nishu <nishuta...@gmail.com> wrote:

Hi Eugene,


I ran it on both  standalone flink(non Yarn) and  Flink on HDInsight
Cluster(Yarn). Both ran successfully. :)

Regards,
Nishu

<https://www.avast.com/sig-email?utm_medium=email&utm_
source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_
source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, Nov 22, 2017 at 9:40 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

Thanks Nishu. So, if I understand correctly, your pipelines were

running

on

non-YARN, but you're planning to run with YARN?

I meanwhile was able to get Flink running on Dataproc (YARN), and

validated

quickstart and game examples.
At this point we need validation for Spark and Flink non-YARN [I think

if

Nishu's runs were non-YARN, they'd give us enough confidence, combined

with

the success of other validations of Spark and Flink runners?], and Apex

on

YARN. However, it seems that in previous RCs we were not validating

Apex

on

YARN, only local cluster. Is it needed this time?

On Wed, Nov 22, 2017 at 12:28 PM Nishu <nishuta...@gmail.com> wrote:

Hi Eugene,


No, I didn't try with those instead I have my custom pipeline where

Kafka

topic is the source. I have defined a Global Window and processing

time

trigger to read the data. Further it runs some transformation i.e.

GroupByKey and CoGroupByKey. on the windowed collections.
I was running the same pipeline on direct runner and spark runner

earlier..

Today gave it a try with Flink on Yarn.

Best Regards,
Nishu.

<
https://www.avast.com/sig-email?utm_medium=email&utm_

source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon

Virus-free.

www.avast.com
<
https://www.avast.com/sig-email?utm_medium=email&utm_

source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


On Wed, Nov 22, 2017 at 8:07 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

Thanks Nishu! Can you clarify which pipeline you were running?

The validation spreadsheet includes 1) the quickstart and 2) mobile

game

walkthroughs. Was it one of these, or your custom pipeline?


On Wed, Nov 22, 2017 at 10:20 AM Nishu <nishuta...@gmail.com>

wrote:

Hi,


Typo in previous mail.  I meant Flink runner.

Thanks,
Nishu
On Wed, 22 Nov 2017 at 19.17,

Hi,


I build a pipeline using RC 2.2 today and ran with runner on

yarn.

It worked seamlessly for unbounded sources. Couldn’t see any

issues

with

my pipeline so far :)



Thanks,Nishu

On Wed, 22 Nov 2017 at 18.57, Reuven Lax

<re...@google.com.invalid

wrote:


Who is validating Flink and Yarn?


On Tue, Nov 21, 2017 at 9:26 AM, Kenneth Knowles

<k...@google.com.invalid

wrote:


On Mon, Nov 20, 2017 at 5:01 PM, Eugene Kirpichov <

kirpic...@google.com.invalid> wrote:

In the verification spreadsheet, I'm not sure I understand

the

difference

between the "YARN" and "Standalone cluster/service". Which

is

Dataproc?

It

definitely uses YARN, but it is also a standalone

cluster/service.

Does

it

count for both?

No, it doesn't. A number of runners have their own non-YARN

cluster

mode. I

would expect that the launching experience might be

different

and

the

portable container management to differ. If they are

identical,

experts

in

those systems should feel free to coalesce the rows.

Conversely,

as

other

platforms become supported, they could be added or not based

on

whether

they are substantively different from a user experience or

QA

point

of

view.


Kenn


Seems now we're missing just Apex and Flink cluster

verifications.

*though Spark runner took 6x longer to run UserScore,

partially

guess

because it didn't do autoscaling (Dataflow runner ramped

up

to 5

workers

whereas Spark runner used 2 workers). For some reason

Spark

runner

chose

not to split the 10GB input files into chunks.


On Mon, Nov 20, 2017 at 3:46 PM Reuven Lax

<re...@google.com.invalid

wrote:


Done


On Tue, Nov 21, 2017 at 3:08 AM, Robert Bradshaw <
rober...@google.com.invalid> wrote:

Thanks. You need to re-sign as well.


On Mon, Nov 20, 2017 at 12:14 AM, Reuven Lax

<re...@google.com.invalid

wrote:

FYI these generated files have been removed from the

source

distribution.

On Sat, Nov 18, 2017 at 9:09 AM, Reuven Lax <

re...@google.com

wrote:

hmmm, I thought I removed those generated files

from

the

zip

file

before

sending this email. Let me check again.


Reuven

On Sat, Nov 18, 2017 at 8:52 AM, Robert Bradshaw <
rober...@google.com.invalid> wrote:

The source distribution contains a couple of files

not

on

github

(e.g.

folders that were added on master, Python

generated

files).

The

pom

files differed only by missing -SNAPSHOT, other

than

that

presumably

the source release should just be "wget


https://github.com/apache/beam/archive/release-2.2.0.zip

"?

diff -rq apache-beam-2.2.0 beam/ | grep -v pom.xml

# OK?

Only in apache-beam-2.2.0: DEPENDENCIES

# Expected.

Only in beam/: .git
Only in beam/: .gitattributes
Only in beam/: .gitignore

# These folders are probably from switching around

between

master

and

git branches.


Only in apache-beam-2.2.0: model
Only in apache-beam-2.2.0/runners/flink: examples
Only in apache-beam-2.2.0/runners/flink: runner
Only in apache-beam-2.2.0/runners/gearpump:

jarstore

Only in apache-beam-2.2.0/sdks/java/extensions:

gcp-core

Only in apache-beam-2.2.0/sdks/java/extensions:

sketching

Only in apache-beam-2.2.0/sdks/java/io:

file-based-io-tests

Only in apache-beam-2.2.0/sdks/java/io: hdfs

Only in apache-beam-2.2.0/sdks/java/

maven-archetypes/examples/src/

ma

in/resources/archetype-resources:

src
Only in apache-beam-2.2.0/sdks/java/

maven-archetypes/examples-

java8/

src/main/resources/archetype-resources:

src
Only in apache-beam-2.2.0/sdks/java:

microbenchmarks

# Here's the generated protos.

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_artifact_api_pb2_grpc.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_artifact_api_pb2.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_fn_api_pb2_grpc.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_fn_api_pb2.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_job_api_pb2_grpc.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_job_api_pb2.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_provision_api_pb2_grpc.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_provision_api_pb2.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_runner_api_pb2_grpc.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

beam_runner_api_pb2.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

endpoints_pb2_grpc.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

endpoints_pb2.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

standard_window_fns_pb2_grpc.py

Only in apache-beam-2.2.0/sdks/python/

apache_beam/portability/api:

standard_window_fns_pb2.py


And some other sdist generated Python files.

Only in apache-beam-2.2.0/sdks/python: .eggs
Only in apache-beam-2.2.0/sdks/python: LICENSE
Only in apache-beam-2.2.0/sdks/python: NOTICE
Only in apache-beam-2.2.0/sdks/python: README.md

Presumably we should just purge these files from

the

rc?


FWIW, the Python tarball looks fine.

On Fri, Nov 17, 2017 at 4:40 PM, Eugene Kirpichov
<kirpic...@google.com.invalid> wrote:

How can I specify a dependency on the staged RC?

E.g.

I'm

trying

to

validate the quickstart per

https://beam.apache.org/get-

started/quickstart-java/

and

specifying

version

2.2.0 doesn't work I suppose because it's not

released

yet.

Should

pass

some command-line flag to mvn to make it fetch

the

version

from

the

staging

area?

On Fri, Nov 17, 2017 at 4:37 PM Lukasz Cwik

<lc...@google.com.invalid

wrote:


Its open to all, its just that there are

binding

votes

and

non-binding

votes.


On Fri, Nov 17, 2017 at 4:26 PM, Valentyn

Tymofieiev

valen...@google.com.invalid> wrote:


I have a process question: is the vote open

for

committers

only

or

for

all

contributors?

On Fri, Nov 17, 2017 at 4:06 PM, Lukasz Cwik

<lc...@google.com.invalid>

wrote:


+1, Approve the release


I have verified the wordcount quickstart on

the

Apache

Beam

website

using

Apex, DirectRunner, Flink & Spark on Linux.


The Gearpump runner is yet to have a

quickstart

listed

on

our

website.

Adding the quickstart is already

represented

by

this

existing

issue:

https://issues.apache.org/

jira/browse/BEAM-2692

On Fri, Nov 17, 2017 at 11:50 AM, Valentyn

Tymofieiev <

valen...@google.com.invalid> wrote:


I have verified: SHA & MD5 signatures of

Python

artifacts

in

[2], and

checked Python side of the validation

checklist

on

Linux.

There is one known issue in UserScore

example

for

Dataflow

runner.

The

issue has been fixed on master branch and

does

not

require a

cherry-pick

at

this point. A workaround is to pass

--save_main_session

pipeline

option.

RC4 looks good to me so far.


On Fri, Nov 17, 2017 at 11:30 AM, Kenneth

Knowles

<k...@google.com.invalid

wrote:

Hi all,


Following up on past discussions and
https://issues.apache.org/

jira/browse/BEAM-1189

have

prepared a

spreadsheet so we can sign up for


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [VOTE] Release 2.2.0, release candidate #4

Reply via email to