Awesome, Thanks Reuven !
Regards
JB
On 11/25/2017 07:01 AM, Reuven Lax wrote:
I'll go ahead and send the RESULT email right now.
On Fri, Nov 24, 2017 at 9:56 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:
It's not sync: promoting the artifacts takes some time (at least 30
minutes). So, the artifacts will be on central after a certain time.
I confirm that it's OK because now the artifacts are on Central:
http://repo.maven.apache.org/maven2/org/apache/beam/beam-sdk
s-java-core/2.2.0/
By the way, you are promoting the artifacts to Central but I didn't see
any [RESULT] e-mail on the vote thread. You have first to close the vote,
then promote the artifacts, announce the release, etc.
Regards
JB
On 11/25/2017 12:43 AM, Reuven Lax wrote:
Appears to be a problem :)
I tried publishing the latest artifact from Apache Nexus to Maven Central.
After clicking publish, Nexus claimed that the operation has completed.
However a look at the Maven Central page (
https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-core)
does not show 2.2.0 artifacts, and the staging repository has now vanished
from the Nexus site! Does anyone know what happened here?
Reuven
On Wed, Nov 22, 2017 at 11:04 PM, Thomas Weise <t...@apache.org> wrote:
+1
Run quickstart with Apex runner in embedded mode and on YARN.
It needed couple tweaks to get there though.
1) Change quickstart pom.xml apex-runner profile:
<!--
Apex 3.6 is built against YARN 2.6. For this fat jar, the
included version has to match what's on the cluster,
hence we need to repeat the Apex Hadoop dependencies at the
required version here.
-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>${hadoop.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>runtime</scope>
</dependency>
2) After copying the fat jar to the cluster:
java -cp word-count-beam-bundled-0.1.jar org.apache.beam.examples.
WordCount
\
--inputFile=file:///tmp/input.txt --output=/tmp/counts
--embeddedExecution=false --configFile=beam-runners-apex.properties
--runner=ApexRunner
(this was on a single node cluster, hence the local file path)
The quickstart instructions suggest to use *mvn exec:java* instead of
*java*
- it generally isn't valid to assume that mvn and a build environment
exists on the edge node of a YARN cluster.
On Wed, Nov 22, 2017 at 2:12 PM, Nishu <nishuta...@gmail.com> wrote:
Hi Eugene,
I ran it on both standalone flink(non Yarn) and Flink on HDInsight
Cluster(Yarn). Both ran successfully. :)
Regards,
Nishu
<https://www.avast.com/sig-email?utm_medium=email&utm_
source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_
source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
On Wed, Nov 22, 2017 at 9:40 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:
Thanks Nishu. So, if I understand correctly, your pipelines were
running
on
non-YARN, but you're planning to run with YARN?
I meanwhile was able to get Flink running on Dataproc (YARN), and
validated
quickstart and game examples.
At this point we need validation for Spark and Flink non-YARN [I think
if
Nishu's runs were non-YARN, they'd give us enough confidence, combined
with
the success of other validations of Spark and Flink runners?], and Apex
on
YARN. However, it seems that in previous RCs we were not validating
Apex
on
YARN, only local cluster. Is it needed this time?
On Wed, Nov 22, 2017 at 12:28 PM Nishu <nishuta...@gmail.com> wrote:
Hi Eugene,
No, I didn't try with those instead I have my custom pipeline where
Kafka
topic is the source. I have defined a Global Window and processing
time
trigger to read the data. Further it runs some transformation i.e.
GroupByKey and CoGroupByKey. on the windowed collections.
I was running the same pipeline on direct runner and spark runner
earlier..
Today gave it a try with Flink on Yarn.
Best Regards,
Nishu.
<
https://www.avast.com/sig-email?utm_medium=email&utm_
source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon
Virus-free.
www.avast.com
<
https://www.avast.com/sig-email?utm_medium=email&utm_
source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
On Wed, Nov 22, 2017 at 8:07 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:
Thanks Nishu! Can you clarify which pipeline you were running?
The validation spreadsheet includes 1) the quickstart and 2) mobile
game
walkthroughs. Was it one of these, or your custom pipeline?
On Wed, Nov 22, 2017 at 10:20 AM Nishu <nishuta...@gmail.com>
wrote:
Hi,
Typo in previous mail. I meant Flink runner.
Thanks,
Nishu
On Wed, 22 Nov 2017 at 19.17,
Hi,
I build a pipeline using RC 2.2 today and ran with runner on
yarn.
It worked seamlessly for unbounded sources. Couldn’t see any
issues
with
my pipeline so far :)
Thanks,Nishu
On Wed, 22 Nov 2017 at 18.57, Reuven Lax
<re...@google.com.invalid
wrote:
Who is validating Flink and Yarn?
On Tue, Nov 21, 2017 at 9:26 AM, Kenneth Knowles
<k...@google.com.invalid
wrote:
On Mon, Nov 20, 2017 at 5:01 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:
In the verification spreadsheet, I'm not sure I understand
the
difference
between the "YARN" and "Standalone cluster/service". Which
is
Dataproc?
It
definitely uses YARN, but it is also a standalone
cluster/service.
Does
it
count for both?
No, it doesn't. A number of runners have their own non-YARN
cluster
mode. I
would expect that the launching experience might be
different
and
the
portable container management to differ. If they are
identical,
experts
in
those systems should feel free to coalesce the rows.
Conversely,
as
other
platforms become supported, they could be added or not based
on
whether
they are substantively different from a user experience or
QA
point
of
view.
Kenn
Seems now we're missing just Apex and Flink cluster
verifications.
*though Spark runner took 6x longer to run UserScore,
partially
I
guess
because it didn't do autoscaling (Dataflow runner ramped
up
to 5
workers
whereas Spark runner used 2 workers). For some reason
Spark
runner
chose
not to split the 10GB input files into chunks.
On Mon, Nov 20, 2017 at 3:46 PM Reuven Lax
<re...@google.com.invalid
wrote:
Done
On Tue, Nov 21, 2017 at 3:08 AM, Robert Bradshaw <
rober...@google.com.invalid> wrote:
Thanks. You need to re-sign as well.
On Mon, Nov 20, 2017 at 12:14 AM, Reuven Lax
<re...@google.com.invalid
wrote:
FYI these generated files have been removed from the
source
distribution.
On Sat, Nov 18, 2017 at 9:09 AM, Reuven Lax <
re...@google.com
wrote:
hmmm, I thought I removed those generated files
from
the
zip
file
before
sending this email. Let me check again.
Reuven
On Sat, Nov 18, 2017 at 8:52 AM, Robert Bradshaw <
rober...@google.com.invalid> wrote:
The source distribution contains a couple of files
not
on
github
(e.g.
folders that were added on master, Python
generated
files).
The
pom
files differed only by missing -SNAPSHOT, other
than
that
presumably
the source release should just be "wget
https://github.com/apache/beam/archive/release-2.2.0.zip
"?
diff -rq apache-beam-2.2.0 beam/ | grep -v pom.xml
# OK?
Only in apache-beam-2.2.0: DEPENDENCIES
# Expected.
Only in beam/: .git
Only in beam/: .gitattributes
Only in beam/: .gitignore
# These folders are probably from switching around
between
master
and
git branches.
Only in apache-beam-2.2.0: model
Only in apache-beam-2.2.0/runners/flink: examples
Only in apache-beam-2.2.0/runners/flink: runner
Only in apache-beam-2.2.0/runners/gearpump:
jarstore
Only in apache-beam-2.2.0/sdks/java/extensions:
gcp-core
Only in apache-beam-2.2.0/sdks/java/extensions:
sketching
Only in apache-beam-2.2.0/sdks/java/io:
file-based-io-tests
Only in apache-beam-2.2.0/sdks/java/io: hdfs
Only in apache-beam-2.2.0/sdks/java/
maven-archetypes/examples/src/
ma
in/resources/archetype-resources:
src
Only in apache-beam-2.2.0/sdks/java/
maven-archetypes/examples-
java8/
src/main/resources/archetype-resources:
src
Only in apache-beam-2.2.0/sdks/java:
microbenchmarks
# Here's the generated protos.
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_artifact_api_pb2_grpc.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_artifact_api_pb2.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_fn_api_pb2_grpc.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_fn_api_pb2.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_job_api_pb2_grpc.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_job_api_pb2.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_provision_api_pb2_grpc.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_provision_api_pb2.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_runner_api_pb2_grpc.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
beam_runner_api_pb2.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
endpoints_pb2_grpc.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
endpoints_pb2.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
standard_window_fns_pb2_grpc.py
Only in apache-beam-2.2.0/sdks/python/
apache_beam/portability/api:
standard_window_fns_pb2.py
And some other sdist generated Python files.
Only in apache-beam-2.2.0/sdks/python: .eggs
Only in apache-beam-2.2.0/sdks/python: LICENSE
Only in apache-beam-2.2.0/sdks/python: NOTICE
Only in apache-beam-2.2.0/sdks/python: README.md
Presumably we should just purge these files from
the
rc?
FWIW, the Python tarball looks fine.
On Fri, Nov 17, 2017 at 4:40 PM, Eugene Kirpichov
<kirpic...@google.com.invalid> wrote:
How can I specify a dependency on the staged RC?
E.g.
I'm
trying
to
validate the quickstart per
https://beam.apache.org/get-
started/quickstart-java/
and
specifying
version
2.2.0 doesn't work I suppose because it's not
released
yet.
Should
I
pass
some command-line flag to mvn to make it fetch
the
version
from
the
staging
area?
On Fri, Nov 17, 2017 at 4:37 PM Lukasz Cwik
<lc...@google.com.invalid
wrote:
Its open to all, its just that there are
binding
votes
and
non-binding
votes.
On Fri, Nov 17, 2017 at 4:26 PM, Valentyn
Tymofieiev
<
valen...@google.com.invalid> wrote:
I have a process question: is the vote open
for
committers
only
or
for
all
contributors?
On Fri, Nov 17, 2017 at 4:06 PM, Lukasz Cwik
<lc...@google.com.invalid>
wrote:
+1, Approve the release
I have verified the wordcount quickstart on
the
Apache
Beam
website
using
Apex, DirectRunner, Flink & Spark on Linux.
The Gearpump runner is yet to have a
quickstart
listed
on
our
website.
Adding the quickstart is already
represented
by
this
existing
issue:
https://issues.apache.org/
jira/browse/BEAM-2692
On Fri, Nov 17, 2017 at 11:50 AM, Valentyn
Tymofieiev <
valen...@google.com.invalid> wrote:
I have verified: SHA & MD5 signatures of
Python
artifacts
in
[2], and
checked Python side of the validation
checklist
on
Linux.
There is one known issue in UserScore
example
for
Dataflow
runner.
The
issue has been fixed on master branch and
does
not
require a
cherry-pick
at
this point. A workaround is to pass
--save_main_session
pipeline
option.
RC4 looks good to me so far.
On Fri, Nov 17, 2017 at 11:30 AM, Kenneth
Knowles
<k...@google.com.invalid
wrote:
Hi all,
Following up on past discussions and
https://issues.apache.org/
jira/browse/BEAM-1189
I
have
prepared a
spreadsheet so we can sign up for
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com