Re: Failure in Apex runner

2017-07-09 Thread Kenneth Knowles
On that same subject: in the SDK-angnostic proto for a pipeline there is no
such thing as a main output [1]. The distinction between single and
multiple output ParDo is SDK-specific.

Kenn

[1]
https://github.com/apache/beam/blob/master/sdks/common/runner-api/src/main/proto/beam_runner_api.proto#L157

On Sun, Jul 9, 2017 at 8:38 AM, Reuven Lax  wrote:

> Yes. Semantically all outputs from a ParDo are equivalent, so the watermark
> should traverse them all. The only reason a "default" output exists is for
> convenience so we don't force users to always specify an output tag.
>
> On Sun, Jul 9, 2017 at 12:03 AM, Thomas Weise  wrote:
>
> > This error turns out to be deterministic and debug friendly :) I enabled
> > trace and found that the watermark "disappears" between the following two
> > operators:
> >
> > [8/WriteCounts/WriteFiles/WriteBundles:ApexParDoOperator]
> >
> > [10/WriteCounts/WriteFiles/GroupUnwritten:ApexGroupByKeyOperator]
> > GroupUnwritten takes input from additonal outputs, but the watermark is
> > only emitted on the main output. When I modify ApexParDoOperator to emit
> > the watermark also on additionalOutput1, it traverses the pipeline and
> the
> > test passes.
> >
> > Are watermarks supposed to be written to additional outputs?
> >
> > Thanks,
> > Thomas
> >
> >
> > On Thu, Jul 6, 2017 at 1:35 PM, Reuven Lax 
> > wrote:
> >
> > > Thomas, any suggestions on what we should do? Do you have an idea
> what's
> > > going on, or should we exclude this test for now until you have time to
> > > look at it?
> > >
> > > Reuven
> > >
> > > On Wed, Jul 5, 2017 at 3:36 PM, Reuven Lax  wrote:
> > >
> > > > I wonder if the watermark is accidentally advancing too early,
> causing
> > > > Apex to shut down the pipeline before the final finalize DoFn
> executes?
> > > >
> > > > On Wed, Jul 5, 2017 at 1:45 PM, Thomas Weise  wrote:
> > > >
> > > >> I don't think this is a problem with the test and if anything this
> > > problem
> > > >> to me shows the test is useful in catching similar issues during
> unit
> > > test
> > > >> runs.
> > > >>
> > > >> Is there any form of asynchronous/trigger based processing in this
> > > >> pipeline
> > > >> that could cause this?
> > > >>
> > > >> The Apex runner will shutdown the pipeline after the final
> watermark,
> > > the
> > > >> shutdown signal traverses the pipeline just like a watermark, but it
> > is
> > > >> not
> > > >> seen by user code.
> > > >>
> > > >> Thomas
> > > >>
> > > >> --
> > > >> sent from mobile
> > > >> On Jul 5, 2017 1:19 PM, "Kenneth Knowles" 
> > > wrote:
> > > >>
> > > >> > Upon further investigation, this tests always writes to
> > > >> > ./target/wordcountresult-0-of-2 and
> > > >> > ./target/wordcountresult-1-of-2. So after a successful
> test
> > > >> run,
> > > >> > any further run without a `clean` will spuriously succeed. I was
> > > running
> > > >> > via IntelliJ so did not do the ritual `mvn clean` workaround. So
> > > >> > reproduction appears to be easy and we could fix the test (if we
> > don't
> > > >> > remove it) to use a fresh temp dir.
> > > >> >
> > > >> > This seems to point to a bug in waitUntilFinish() and/or Apex if
> the
> > > >> > topology is shut down before this ParDo is run. This is a ParDo
> with
> > > >> > trivial bounded input but with side inputs. So I would guess the
> bug
> > > is
> > > >> > either in watermark tracking / readiness of the side input or just
> > how
> > > >> > PushbackSideInputDoFnRunner is used.
> > > >> >
> > > >> > On Wed, Jul 5, 2017 at 12:23 PM, Reuven Lax
> >  > > >
> > > >> > wrote:
> > > >> >
> > > >> > > I've done a bit more debugging with logging. It appears that the
> > > >> finalize
> > > >> > > ParDo is never being invoked in this Apex test (or at least the
> > > >> LOG.info
> > > >> > in
> > > >> > > that ParDo never runs). This ParDo is run on a constant element
> > > (code
> > > >> > > snippet below), so it should always run.
> > > >> > >
> > > >> > > PCollection singletonCollection = p.apply(Create.of((Void)
> > > >> null));
> > > >> > > singletonCollection
> > > >> > > .apply("Finalize", ParDo.of(new DoFn() {
> > > >> > >   @ProcessElement
> > > >> > >   public void processElement(ProcessContext c) throws
> > Exception
> > > {
> > > >> > > LOG.info("Finalizing write operation {}.",
> > writeOperation);
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Jul 5, 2017 at 11:22 AM, Kenneth Knowles
> > > >>  > > >> > >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Data-dependent file destinations is a pretty great feature. We
> > > also
> > > >> > have
> > > >> > > > another change to make to this @Experimental feature, and it
> > would
> > > >> be
> > > >> > > nice
> > > >> > > > to get them both into 2.1.0 if we can unblock this quickly.
> > > >> > > >
> > > >> > > > I just tried this too, and failed to reproduce it. But Jenkins
> > and
> > > >> > Reuven
> > > >> > > > both have a reliable repro.
> > > >> > > >
> > > >> > > > Questionss:
> >

Re: BEAM-933 - Not reproduceable

2017-07-09 Thread Kenneth Knowles
I think the key line you will want to change is here:
https://github.com/apache/beam/blob/master/examples/java/pom.xml#L375

On Sun, Jul 9, 2017 at 12:17 AM, Apache Enthu  wrote:

> Hi,
>
> Is BEAM-933 already fixed? I'm unable to reproduce the bug by running maven
> build. Here's what i see:
>
> [INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @
> beam-examples-java ---
> [INFO] Changes detected - recompiling the module!
> [INFO] Compiling 15 source files to
> C:\workspace-apache\beam\examples\java\target\test-classes
> [INFO]
> /C:/workspace-apache/beam/examples/java/src/test/java/
> org/apache/beam/examples/WindowedWordCountIT.java:
> C:\workspace-apache\beam\examples\java\src\test\java\
> org\apache\beam\examples\WindowedWordCountIT.java
> uses or overrides a deprecated API.
> [INFO]
> /C:/workspace-apache/beam/examples/java/src/test/java/
> org/apache/beam/examples/WindowedWordCountIT.java:
> Recompile with -Xlint:deprecation for details.
> [INFO]
> /C:/workspace-apache/beam/examples/java/src/test/java/
> org/apache/beam/examples/complete/AutoCompleteTest.java:
> Some input files use unchecked or unsafe operations.
> [INFO]
> /C:/workspace-apache/beam/examples/java/src/test/java/
> org/apache/beam/examples/complete/AutoCompleteTest.java:
> Recompile with -Xlint:unchecked for details.
> [INFO]
>
>
>
> *[INFO] --- maven-checkstyle-plugin:2.17:check (default) @
> beam-examples-java ---[INFO] Starting audit...Audit done.*[INFO]
> [INFO] --- maven-surefire-plugin:2.20:test (default-test) @
> beam-examples-java ---
> [INFO]
>
> Could you please check and let me know, so we could close this issue.
>
> Also there seems to be an issue with DebuggingWordCountTest, running on
> Windows. It says:
>
>
> *org.apache.beam.sdk.Pipeline$PipelineExecutionException:
> java.lang.IllegalStateException: Unable to find registrar for c*at
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
> at
> org.apache.beam.examples.DebuggingWordCount.main(
> DebuggingWordCount.java:160)
> at
> org.apache.beam.examples.DebuggingWordCountTest.testDebuggingWordCount(
> DebuggingWordCountTest.java:53)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
> FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(
> ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(
> FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.
> evaluate(InvokeMethod.java:17)
> at org.junit.rules.ExternalResource$1.evaluate(
> ExternalResource.java:48)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(
> BlockJUnit4ClassRunner.java:78)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(
> BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(
> JUnit4TestReference.java:86)
> at
> org.eclipse.jdt.internal.junit.runner.TestExecution.
> run(TestExecution.java:38)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.
> runTests(RemoteTestRunner.java:459)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.
> runTests(RemoteTestRunner.java:675)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.
> run(RemoteTestRunner.java:382)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.
> main(RemoteTestRunner.java:192)
> Caused by: java.lang.IllegalStateException: Unable to find registrar for c
> at
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(
> FileSystems.java:447)
> at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
> at
> org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(
> FileBasedSource.java:207)
> at
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.
> getInitialInputs(BoundedReadEvaluatorFactory.java:207)
> at
> org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.
> getInitialInputs(ReadEvaluatorFactory.java:87)
> at
> org.apache.beam.

Re: BEAM-934 - Jira permission and pull request

2017-07-09 Thread Kenneth Knowles
Just a tiny correction - I think the JIRA role "contributor" for the Beam
can take JIRAs without a committer assigning to them. But definitely you
_must_ have this role or even a committer cannot give you a JIRA.

What is your JIRA account, so I can add you as a contributor?

Kenn

On Sun, Jul 9, 2017 at 12:18 PM, Jean-Baptiste Onofré 
wrote:

> By Jira ID, I mean YOUR account ID.
>
> Committer is a long process: https://beam.apache.org/contri
> bute/contribution-guide/#granting-more-rights-to-a-contributor
>
> I suggest to take a long on how Apache works:
>
> http://www.apache.org/
>
> Regards
> JB
>
>
> On 07/09/2017 07:47 PM, Apache Enthu wrote:
>
>> Thanks JB. Jira Id is in Subject BEAM-934.
>> https://issues.apache.org/jira/browse/BEAM-934
>>
>> How do i get added as committer please? Or are there any criteria for me
>> to
>> be added to as Committer?
>>
>> Thanks,
>> Almas
>>
>> On Sun, Jul 9, 2017 at 5:40 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>> Hi,
>>>
>>> you have to be committer to do the assignment.
>>>
>>> If you provide your Jira ID, I will assign the Jira to you.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 07/09/2017 08:30 AM, Apache Enthu wrote:
>>>
>>> Hi,

 I'm newbie in this project and i have picked up simple jira from the
 open
 jira list.

 It seems i don't have permission to assign jira to myself and move it
 through its lifecycle.

 I have created the pull request https://github.com/apache/beam
 /pull/3526

 Could you please let me know how could i get permission in Jira. Also
 please could you approve my pull request.

 Thanks,
 Almas


 --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: BEAM-934 - Jira permission and pull request

2017-07-09 Thread Jean-Baptiste Onofré

By Jira ID, I mean YOUR account ID.

Committer is a long process: 
https://beam.apache.org/contribute/contribution-guide/#granting-more-rights-to-a-contributor


I suggest to take a long on how Apache works:

http://www.apache.org/

Regards
JB

On 07/09/2017 07:47 PM, Apache Enthu wrote:

Thanks JB. Jira Id is in Subject BEAM-934.
https://issues.apache.org/jira/browse/BEAM-934

How do i get added as committer please? Or are there any criteria for me to
be added to as Committer?

Thanks,
Almas

On Sun, Jul 9, 2017 at 5:40 PM, Jean-Baptiste Onofré 
wrote:


Hi,

you have to be committer to do the assignment.

If you provide your Jira ID, I will assign the Jira to you.

Regards
JB


On 07/09/2017 08:30 AM, Apache Enthu wrote:


Hi,

I'm newbie in this project and i have picked up simple jira from the open
jira list.

It seems i don't have permission to assign jira to myself and move it
through its lifecycle.

I have created the pull request https://github.com/apache/beam/pull/3526

Could you please let me know how could i get permission in Jira. Also
please could you approve my pull request.

Thanks,
Almas



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: BEAM-934 - Jira permission and pull request

2017-07-09 Thread Apache Enthu
Thanks JB. Jira Id is in Subject BEAM-934.
https://issues.apache.org/jira/browse/BEAM-934

How do i get added as committer please? Or are there any criteria for me to
be added to as Committer?

Thanks,
Almas

On Sun, Jul 9, 2017 at 5:40 PM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> you have to be committer to do the assignment.
>
> If you provide your Jira ID, I will assign the Jira to you.
>
> Regards
> JB
>
>
> On 07/09/2017 08:30 AM, Apache Enthu wrote:
>
>> Hi,
>>
>> I'm newbie in this project and i have picked up simple jira from the open
>> jira list.
>>
>> It seems i don't have permission to assign jira to myself and move it
>> through its lifecycle.
>>
>> I have created the pull request https://github.com/apache/beam/pull/3526
>>
>> Could you please let me know how could i get permission in Jira. Also
>> please could you approve my pull request.
>>
>> Thanks,
>> Almas
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [PROPOSAL] Connectors for memcache and Couchbase

2017-07-09 Thread Lukasz Cwik
For the source:
Do you plan to support enumerating all the keys via cachedump / lru_crawler
metadump / ...?
If there is an option which doesn't require enumerating the keys, how will
splitting be done (no splitting / splitting on slab ids / ...)?
Can the cache be read while its still being modified (will effectively a
snapshot be made using a watcher or is it expected that the cache will be
read only or inconsistent when reading)?

Also, as a usability point, all PTransforms are meant to be applied to
PCollections and not vice versa.
e.g.
PCollection keys = ...;
keys.apply(MemCacheIO.withConfig());

This makes it so that people can write:
PCollection<...> output =
input.apply(ptransform1).apply(ptransform2).apply(...);
It also makes it so that a PTransform can be applied to multiple
PCollections.

If you haven't already, I would also suggest that you take a look at the
Pipeline I/O guide: https://beam.apache.org/documentation/io/io-toc/
Talks about various usability points and how to write a good I/O connector.


On Sat, Jul 8, 2017 at 9:31 PM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> Great job !
>
> I'm looking forward for the PRs review.
>
> Regards
> JB
>
>
> On 07/08/2017 09:50 AM, Madhusudan Borkar wrote:
>
>> Hi,
>> We are proposing to build connectors for memcache first and then use it
>> for
>> Couchbase. The connector for memcache will be build as a IOTransform and
>> then it can be used for other memcache implementations including
>> Couchbase.
>>
>> 1. As Source
>>
>> input will be a key(String / byte[]), output will be a KV
>>
>> where key - String / byte[]
>>
>> value - String / byte[]
>>
>> Spymemcached supports a multi-get operation where it takes a bunch of
>> keys and retrieves the associated values, the input PCollection can
>> be
>> bundled into multiple batches and each batch can be submitted via the
>> multi-get operation.
>>
>> PCollection> values =
>>
>> MemCacheIO
>>
>> .withConfig()
>>
>> .read()
>>
>> .withKey(PCollection);
>>
>>
>> 2. As Sink
>>
>> input will be a KV, output will be none or probably a
>> boolean indicating the outcome of the operation
>>
>>
>>
>>
>>
>> //write
>>
>> MemCacheIO
>>
>> .withConfig()
>>
>> .write()
>>
>> .withEntries(PCollection>);
>>
>>
>> Implementation plan
>>
>> 1. Develop Memcache connector with 'set' and 'add' operation
>>
>> 2. Then develop other operations
>>
>> 3. Use Memcache connector for Couchbase
>>
>>
>> Thanks @Ismael for help
>>
>> Please, let us know your views.
>>
>> Madhu Borkar
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Failure in Apex runner

2017-07-09 Thread Reuven Lax
Yes. Semantically all outputs from a ParDo are equivalent, so the watermark
should traverse them all. The only reason a "default" output exists is for
convenience so we don't force users to always specify an output tag.

On Sun, Jul 9, 2017 at 12:03 AM, Thomas Weise  wrote:

> This error turns out to be deterministic and debug friendly :) I enabled
> trace and found that the watermark "disappears" between the following two
> operators:
>
> [8/WriteCounts/WriteFiles/WriteBundles:ApexParDoOperator]
>
> [10/WriteCounts/WriteFiles/GroupUnwritten:ApexGroupByKeyOperator]
> GroupUnwritten takes input from additonal outputs, but the watermark is
> only emitted on the main output. When I modify ApexParDoOperator to emit
> the watermark also on additionalOutput1, it traverses the pipeline and the
> test passes.
>
> Are watermarks supposed to be written to additional outputs?
>
> Thanks,
> Thomas
>
>
> On Thu, Jul 6, 2017 at 1:35 PM, Reuven Lax 
> wrote:
>
> > Thomas, any suggestions on what we should do? Do you have an idea what's
> > going on, or should we exclude this test for now until you have time to
> > look at it?
> >
> > Reuven
> >
> > On Wed, Jul 5, 2017 at 3:36 PM, Reuven Lax  wrote:
> >
> > > I wonder if the watermark is accidentally advancing too early, causing
> > > Apex to shut down the pipeline before the final finalize DoFn executes?
> > >
> > > On Wed, Jul 5, 2017 at 1:45 PM, Thomas Weise  wrote:
> > >
> > >> I don't think this is a problem with the test and if anything this
> > problem
> > >> to me shows the test is useful in catching similar issues during unit
> > test
> > >> runs.
> > >>
> > >> Is there any form of asynchronous/trigger based processing in this
> > >> pipeline
> > >> that could cause this?
> > >>
> > >> The Apex runner will shutdown the pipeline after the final watermark,
> > the
> > >> shutdown signal traverses the pipeline just like a watermark, but it
> is
> > >> not
> > >> seen by user code.
> > >>
> > >> Thomas
> > >>
> > >> --
> > >> sent from mobile
> > >> On Jul 5, 2017 1:19 PM, "Kenneth Knowles" 
> > wrote:
> > >>
> > >> > Upon further investigation, this tests always writes to
> > >> > ./target/wordcountresult-0-of-2 and
> > >> > ./target/wordcountresult-1-of-2. So after a successful test
> > >> run,
> > >> > any further run without a `clean` will spuriously succeed. I was
> > running
> > >> > via IntelliJ so did not do the ritual `mvn clean` workaround. So
> > >> > reproduction appears to be easy and we could fix the test (if we
> don't
> > >> > remove it) to use a fresh temp dir.
> > >> >
> > >> > This seems to point to a bug in waitUntilFinish() and/or Apex if the
> > >> > topology is shut down before this ParDo is run. This is a ParDo with
> > >> > trivial bounded input but with side inputs. So I would guess the bug
> > is
> > >> > either in watermark tracking / readiness of the side input or just
> how
> > >> > PushbackSideInputDoFnRunner is used.
> > >> >
> > >> > On Wed, Jul 5, 2017 at 12:23 PM, Reuven Lax
>  > >
> > >> > wrote:
> > >> >
> > >> > > I've done a bit more debugging with logging. It appears that the
> > >> finalize
> > >> > > ParDo is never being invoked in this Apex test (or at least the
> > >> LOG.info
> > >> > in
> > >> > > that ParDo never runs). This ParDo is run on a constant element
> > (code
> > >> > > snippet below), so it should always run.
> > >> > >
> > >> > > PCollection singletonCollection = p.apply(Create.of((Void)
> > >> null));
> > >> > > singletonCollection
> > >> > > .apply("Finalize", ParDo.of(new DoFn() {
> > >> > >   @ProcessElement
> > >> > >   public void processElement(ProcessContext c) throws
> Exception
> > {
> > >> > > LOG.info("Finalizing write operation {}.",
> writeOperation);
> > >> > >
> > >> > >
> > >> > > On Wed, Jul 5, 2017 at 11:22 AM, Kenneth Knowles
> > >>  > >> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Data-dependent file destinations is a pretty great feature. We
> > also
> > >> > have
> > >> > > > another change to make to this @Experimental feature, and it
> would
> > >> be
> > >> > > nice
> > >> > > > to get them both into 2.1.0 if we can unblock this quickly.
> > >> > > >
> > >> > > > I just tried this too, and failed to reproduce it. But Jenkins
> and
> > >> > Reuven
> > >> > > > both have a reliable repro.
> > >> > > >
> > >> > > > Questionss:
> > >> > > >
> > >> > > >  - Any ideas about how these configurations differ?
> > >> > > >  - Does this actually affect users?
> > >> > > >  - Once we have another test that catches this issue, can we
> > delete
> > >> > this
> > >> > > > test?
> > >> > > >
> > >> > > > Every other test passes, including the actual example
> WordCountIT.
> > >> > Since
> > >> > > > the PR doesn't change primitives, it also seems like it is an
> > >> existing
> > >> > > > issue. And the test seems redundant with our other testing but
> > won't
> > >> > get
> > >> > > as
> > >> > > > much maintenance attention. I don't want to stop catching
> wh

Re: BEAM-934 - Jira permission and pull request

2017-07-09 Thread Jean-Baptiste Onofré

Hi,

you have to be committer to do the assignment.

If you provide your Jira ID, I will assign the Jira to you.

Regards
JB

On 07/09/2017 08:30 AM, Apache Enthu wrote:

Hi,

I'm newbie in this project and i have picked up simple jira from the open
jira list.

It seems i don't have permission to assign jira to myself and move it
through its lifecycle.

I have created the pull request https://github.com/apache/beam/pull/3526

Could you please let me know how could i get permission in Jira. Also
please could you approve my pull request.

Thanks,
Almas



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Build failed in Jenkins: beam_Release_NightlySnapshot #472

2017-07-09 Thread Apache Jenkins Server
See 


--
[...truncated 1.18 MB...]
2017-07-09T07:31:48.256 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/caffinitas/ohc/ohc-core/0.4.3/ohc-core-0.4.3.jar
2017-07-09T07:31:48.308 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/caffinitas/ohc/ohc-core/0.4.3/ohc-core-0.4.3.jar
 (125 KB at 14.4 KB/sec)
2017-07-09T07:31:48.308 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/github/ben-manes/caffeine/caffeine/2.2.6/caffeine-2.2.6.jar
2017-07-09T07:31:48.637 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/github/ben-manes/caffeine/caffeine/2.2.6/caffeine-2.2.6.jar
 (926 KB at 103.6 KB/sec)
2017-07-09T07:31:48.637 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/datastax/cassandra/cassandra-driver-core/3.1.1/cassandra-driver-core-3.1.1.jar
2017-07-09T07:31:48.672 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/eclipse/jdt/core/compiler/ecj/4.4.2/ecj-4.4.2.jar
 (2257 KB at 251.7 KB/sec)
2017-07-09T07:31:48.673 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/io/netty/netty-all/4.0.39.Final/netty-all-4.0.39.Final.jar
 (2219 KB at 247.4 KB/sec)
2017-07-09T07:31:48.803 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/datastax/cassandra/cassandra-driver-core/3.1.1/cassandra-driver-core-3.1.1.jar
 (1029 KB at 113.0 KB/sec)
2017-07-09T07:31:49.609 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/storm/storm-core/1.0.1/storm-core-1.0.1.jar
 (19650 KB at 1984.4 KB/sec)
2017-07-09T07:31:50.090 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/it/unimi/dsi/fastutil/6.5.7/fastutil-6.5.7.jar
 (16508 KB at 1589.8 KB/sec)
2017-07-09T07:31:50.155 [INFO] Downloading: 
http://conjars.org/repo/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
2017-07-09T07:31:50.155 [INFO] Downloading: 
http://conjars.org/repo/org/apache/spark/spark-core_2.10/1.6.3/spark-core_2.10-1.6.3.jar
2017-07-09T07:31:50.161 [INFO] Downloading: 
http://conjars.org/repo/org/scala-lang/scala-compiler/2.10.0/scala-compiler-2.10.0.jar
2017-07-09T07:31:50.161 [INFO] Downloading: 
http://conjars.org/repo/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar
2017-07-09T07:31:50.161 [INFO] Downloading: 
http://conjars.org/repo/cascading/cascading-hadoop/2.6.3/cascading-hadoop-2.6.3.jar
2017-07-09T07:31:50.217 [INFO] Downloading: 
http://conjars.org/repo/cascading/cascading-core/2.6.3/cascading-core-2.6.3.jar
2017-07-09T07:31:50.218 [INFO] Downloading: 
http://conjars.org/repo/riffle/riffle/0.1-dev/riffle-0.1-dev.jar
2017-07-09T07:31:50.219 [INFO] Downloading: 
http://conjars.org/repo/thirdparty/jgrapht-jdk1.6/0.8.1/jgrapht-jdk1.6-0.8.1.jar
2017-07-09T07:31:50.330 [INFO] Downloaded: 
http://conjars.org/repo/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
 (48 KB at 272.5 KB/sec)
2017-07-09T07:31:50.330 [INFO] Downloading: 
http://conjars.org/repo/cascading/cascading-local/2.6.3/cascading-local-2.6.3.jar
2017-07-09T07:31:50.331 [INFO] Downloaded: 
http://conjars.org/repo/riffle/riffle/0.1-dev/riffle-0.1-dev.jar (12 KB at 65.2 
KB/sec)
2017-07-09T07:31:50.497 [INFO] Downloaded: 
http://conjars.org/repo/cascading/cascading-local/2.6.3/cascading-local-2.6.3.jar
 (43 KB at 126.0 KB/sec)
2017-07-09T07:31:50.545 [INFO] Downloaded: 
http://conjars.org/repo/cascading/cascading-hadoop/2.6.3/cascading-hadoop-2.6.3.jar
 (246 KB at 638.7 KB/sec)
2017-07-09T07:31:50.549 [INFO] Downloaded: 
http://conjars.org/repo/thirdparty/jgrapht-jdk1.6/0.8.1/jgrapht-jdk1.6-0.8.1.jar
 (230 KB at 590.9 KB/sec)
2017-07-09T07:31:50.885 [INFO] Downloaded: 
http://conjars.org/repo/cascading/cascading-core/2.6.3/cascading-core-2.6.3.jar 
(680 KB at 938.5 KB/sec)
2017-07-09T07:31:50.888 [INFO] Downloading: 
http://clojars.org/repo/org/apache/spark/spark-core_2.10/1.6.3/spark-core_2.10-1.6.3.jar
2017-07-09T07:31:50.889 [INFO] Downloading: 
http://clojars.org/repo/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar
2017-07-09T07:31:50.889 [INFO] Downloading: 
http://clojars.org/repo/org/scala-lang/scala-compiler/2.10.0/scala-compiler-2.10.0.jar
2017-07-09T07:31:50.959 [INFO] Downloading: 
https://repository.apache.org/content/repositories/releases/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar
2017-07-09T07:31:50.959 [INFO] Downloading: 
https://repository.apache.org/content/repositories/releases/org/scala-lang/scala-compiler/2.10.0/scala-compiler-2.10.0.jar
2017-07-09T07:31:51.078 [INFO] Downloading: 
https://repository.jboss.org/nexus/content/repositories/releases/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar
2017-07-09T07:31:51.078 [INFO] Downloading: 
https://repository.jboss.org/nexus/content/repositories/releases/org/scala-lang/scala-compiler/2.10.0/scala-compiler-2.10.0.jar
2017-07-09T07:31:51.457 [INFO] Downl

BEAM-933 - Not reproduceable

2017-07-09 Thread Apache Enthu
Hi,

Is BEAM-933 already fixed? I'm unable to reproduce the bug by running maven
build. Here's what i see:

[INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @
beam-examples-java ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 15 source files to
C:\workspace-apache\beam\examples\java\target\test-classes
[INFO]
/C:/workspace-apache/beam/examples/java/src/test/java/org/apache/beam/examples/WindowedWordCountIT.java:
C:\workspace-apache\beam\examples\java\src\test\java\org\apache\beam\examples\WindowedWordCountIT.java
uses or overrides a deprecated API.
[INFO]
/C:/workspace-apache/beam/examples/java/src/test/java/org/apache/beam/examples/WindowedWordCountIT.java:
Recompile with -Xlint:deprecation for details.
[INFO]
/C:/workspace-apache/beam/examples/java/src/test/java/org/apache/beam/examples/complete/AutoCompleteTest.java:
Some input files use unchecked or unsafe operations.
[INFO]
/C:/workspace-apache/beam/examples/java/src/test/java/org/apache/beam/examples/complete/AutoCompleteTest.java:
Recompile with -Xlint:unchecked for details.
[INFO]



*[INFO] --- maven-checkstyle-plugin:2.17:check (default) @
beam-examples-java ---[INFO] Starting audit...Audit done.*[INFO]
[INFO] --- maven-surefire-plugin:2.20:test (default-test) @
beam-examples-java ---
[INFO]

Could you please check and let me know, so we could close this issue.

Also there seems to be an issue with DebuggingWordCountTest, running on
Windows. It says:


*org.apache.beam.sdk.Pipeline$PipelineExecutionException:
java.lang.IllegalStateException: Unable to find registrar for c*at
org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
at
org.apache.beam.examples.DebuggingWordCount.main(DebuggingWordCount.java:160)
at
org.apache.beam.examples.DebuggingWordCountTest.testDebuggingWordCount(DebuggingWordCountTest.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: java.lang.IllegalStateException: Unable to find registrar for c
at
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
at
org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:207)
at
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:207)
at
org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:87)
at
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:62)


I have fixed this locally but i dont have permission to create the jira.
Could you please let me know the procedure to get me entitled to create and
update jira?

https://beam.apache.org/contribute/ Doesn't have much info about the same.

Thanks,
Almas


Re: Failure in Apex runner

2017-07-09 Thread Thomas Weise
This error turns out to be deterministic and debug friendly :) I enabled
trace and found that the watermark "disappears" between the following two
operators:

[8/WriteCounts/WriteFiles/WriteBundles:ApexParDoOperator]

[10/WriteCounts/WriteFiles/GroupUnwritten:ApexGroupByKeyOperator]
GroupUnwritten takes input from additonal outputs, but the watermark is
only emitted on the main output. When I modify ApexParDoOperator to emit
the watermark also on additionalOutput1, it traverses the pipeline and the
test passes.

Are watermarks supposed to be written to additional outputs?

Thanks,
Thomas


On Thu, Jul 6, 2017 at 1:35 PM, Reuven Lax  wrote:

> Thomas, any suggestions on what we should do? Do you have an idea what's
> going on, or should we exclude this test for now until you have time to
> look at it?
>
> Reuven
>
> On Wed, Jul 5, 2017 at 3:36 PM, Reuven Lax  wrote:
>
> > I wonder if the watermark is accidentally advancing too early, causing
> > Apex to shut down the pipeline before the final finalize DoFn executes?
> >
> > On Wed, Jul 5, 2017 at 1:45 PM, Thomas Weise  wrote:
> >
> >> I don't think this is a problem with the test and if anything this
> problem
> >> to me shows the test is useful in catching similar issues during unit
> test
> >> runs.
> >>
> >> Is there any form of asynchronous/trigger based processing in this
> >> pipeline
> >> that could cause this?
> >>
> >> The Apex runner will shutdown the pipeline after the final watermark,
> the
> >> shutdown signal traverses the pipeline just like a watermark, but it is
> >> not
> >> seen by user code.
> >>
> >> Thomas
> >>
> >> --
> >> sent from mobile
> >> On Jul 5, 2017 1:19 PM, "Kenneth Knowles" 
> wrote:
> >>
> >> > Upon further investigation, this tests always writes to
> >> > ./target/wordcountresult-0-of-2 and
> >> > ./target/wordcountresult-1-of-2. So after a successful test
> >> run,
> >> > any further run without a `clean` will spuriously succeed. I was
> running
> >> > via IntelliJ so did not do the ritual `mvn clean` workaround. So
> >> > reproduction appears to be easy and we could fix the test (if we don't
> >> > remove it) to use a fresh temp dir.
> >> >
> >> > This seems to point to a bug in waitUntilFinish() and/or Apex if the
> >> > topology is shut down before this ParDo is run. This is a ParDo with
> >> > trivial bounded input but with side inputs. So I would guess the bug
> is
> >> > either in watermark tracking / readiness of the side input or just how
> >> > PushbackSideInputDoFnRunner is used.
> >> >
> >> > On Wed, Jul 5, 2017 at 12:23 PM, Reuven Lax  >
> >> > wrote:
> >> >
> >> > > I've done a bit more debugging with logging. It appears that the
> >> finalize
> >> > > ParDo is never being invoked in this Apex test (or at least the
> >> LOG.info
> >> > in
> >> > > that ParDo never runs). This ParDo is run on a constant element
> (code
> >> > > snippet below), so it should always run.
> >> > >
> >> > > PCollection singletonCollection = p.apply(Create.of((Void)
> >> null));
> >> > > singletonCollection
> >> > > .apply("Finalize", ParDo.of(new DoFn() {
> >> > >   @ProcessElement
> >> > >   public void processElement(ProcessContext c) throws Exception
> {
> >> > > LOG.info("Finalizing write operation {}.", writeOperation);
> >> > >
> >> > >
> >> > > On Wed, Jul 5, 2017 at 11:22 AM, Kenneth Knowles
> >>  >> > >
> >> > > wrote:
> >> > >
> >> > > > Data-dependent file destinations is a pretty great feature. We
> also
> >> > have
> >> > > > another change to make to this @Experimental feature, and it would
> >> be
> >> > > nice
> >> > > > to get them both into 2.1.0 if we can unblock this quickly.
> >> > > >
> >> > > > I just tried this too, and failed to reproduce it. But Jenkins and
> >> > Reuven
> >> > > > both have a reliable repro.
> >> > > >
> >> > > > Questionss:
> >> > > >
> >> > > >  - Any ideas about how these configurations differ?
> >> > > >  - Does this actually affect users?
> >> > > >  - Once we have another test that catches this issue, can we
> delete
> >> > this
> >> > > > test?
> >> > > >
> >> > > > Every other test passes, including the actual example WordCountIT.
> >> > Since
> >> > > > the PR doesn't change primitives, it also seems like it is an
> >> existing
> >> > > > issue. And the test seems redundant with our other testing but
> won't
> >> > get
> >> > > as
> >> > > > much maintenance attention. I don't want to stop catching whatever
> >> this
> >> > > > issue is, though.
> >> > > >
> >> > > > Kenn
> >> > > >
> >> > > > On Wed, Jul 5, 2017 at 10:31 AM, Reuven Lax
> >> 
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Thomas,
> >> > > > >
> >> > > > > This only happens with https://github.com/apache/beam/pull/3356
> .
> >> > > > >
> >> > > > > Reuven
> >> > > > >
> >> > > > > On Mon, Jul 3, 2017 at 6:11 AM, Thomas Weise 
> >> wrote:
> >> > > > >
> >> > > > > > Hi Reuven,
> >> > > > > >
> >> > > > > > I'm not able to reproduce the issue locally. I was hoping to
> see
> >> > >