[jira] [Updated] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1581:

Summary: JSON sources and sinks  (was: JsonIO)

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> A new IO (with source and sink) which will read/write Json files.
> Similarly to {{XmlSource}}/{{XmlSink}}, this IO should have a 
> {{JsonSource}}/{{JonSink}} which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1757) Improve log configuration when using multiple logging frameworks in Pipeline

2017-03-20 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1757:
---

Assignee: Aviem Zur

> Improve log configuration when using multiple logging frameworks in Pipeline
> 
>
> Key: BEAM-1757
> URL: https://issues.apache.org/jira/browse/BEAM-1757
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
>Priority: Minor
>
> We need to improve / document how to configure the different logging 
> frameworks to support the different Beam modules (SDK/Runners/IOs).
> I have seen this reported already in slack, but it also bit me recently when 
> trying to log things with a module whose dependencies only supported java 
> based logging over slf4j (no log4j). I tried to provide the classical 
> log4j.properties but I could not get it to log anything, nothing from the 
> sdk, and logically nothing from the java logging framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1726) TestFlinkRunner should assert PAssert success/failure counters

2017-03-17 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1726:
---

Assignee: Aviem Zur  (was: Aljoscha Krettek)

> TestFlinkRunner should assert PAssert success/failure counters
> --
>
> Key: BEAM-1726
> URL: https://issues.apache.org/jira/browse/BEAM-1726
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> {{TestFlinkRunner}} should assert the value of {{PAssert}}'s success/failure 
> aggregators. i.e. assert that:
> {code}
>  == [number of assertions defined in  the 
> pipeline]
> &&
>  == 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-775) Remove Aggregators from the Java SDK

2017-03-21 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934683#comment-15934683
 ] 

Aviem Zur commented on BEAM-775:


Awesome to see we are moving forward with the removal of aggregators.

However, I see that the PRs related to this ticket replace aggregators with 
metrics.

Since metrics are not supported yet in all runners, how is this going to work?

My chief concern is the success and failure counters in {{PAssert}} since these 
verify that {{RunnableOnService}} tests do in fact test what they are supposed 
to (There have been and are instances in which the runners did not perform the 
assertion but the test passed, and this masked an actual bug in the runner).

> Remove Aggregators from the Java SDK
> 
>
> Key: BEAM-775
> URL: https://issues.apache.org/jira/browse/BEAM-775
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Ben Chambers
>  Labels: backward-incompatible
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-775) Remove Aggregators from the Java SDK

2017-03-21 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935726#comment-15935726
 ] 

Aviem Zur edited comment on BEAM-775 at 3/22/17 3:46 AM:
-

As [~aljoscha] mentioned we are adding this assertion to Flink runner as well, 
and it has surfaced bugs which were silenced by the fact that the tests passed 
even though no {{PAssert}} assertions actually happened -  
https://github.com/apache/beam/pull/2263
All runners should add such an assertion as well, as they might also have 
silent bugs.
There is a ticket to change to doing this assertion in the {{TestPipeline}} 
side of things so that runner writers won't have to know to do this: 
https://issues.apache.org/jira/browse/BEAM-1763
In any case, once all runners support metrics this can be done using metrics. 


was (Author: aviemzur):
As [~aljoscha] mentioned we are adding this assertion to Flink runner as well, 
and it has surfaced bugs which were silenced by the fact that the tests passed 
even though no {{PAssert}} assertions actually happened -  
https://github.com/apache/beam/pull/2263
All runners should add such an assertion as well, as they might also have 
silent bugs.
There is a ticket to change to doing this assertion in the {{TestPipeline}} 
side of things so that runner writers won't have to know to do this: 
https://issues.apache.org/jira/browse/BEAM-1763
In any case, once all runners support metrics this can be done using metrics. 
Until then the only way to do this is with aggregators.

> Remove Aggregators from the Java SDK
> 
>
> Key: BEAM-775
> URL: https://issues.apache.org/jira/browse/BEAM-775
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Pablo Estrada
>  Labels: backward-incompatible
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-775) Remove Aggregators from the Java SDK

2017-03-21 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935726#comment-15935726
 ] 

Aviem Zur commented on BEAM-775:


As [~aljoscha] mentioned we are adding this assertion to Flink runner as well, 
and it has surfaced bugs which were silenced by the fact that the tests passed 
even though no {{PAssert}} assertions actually happened -  
https://github.com/apache/beam/pull/2263
All runners should add such an assertion as well, as they might also have 
silent bugs.
There is a ticket to change to doing this assertion in the {{TestPipeline}} 
side of things so that runner writers won't have to know to do this: 
https://issues.apache.org/jira/browse/BEAM-1763
In any case, once all runners support metrics this can be done using metrics. 
Until then the only way to do this is with aggregators.

> Remove Aggregators from the Java SDK
> 
>
> Key: BEAM-775
> URL: https://issues.apache.org/jira/browse/BEAM-775
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Pablo Estrada
>  Labels: backward-incompatible
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur commented on BEAM-1581:
-

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
{{JsonIO}} should be the enclosing class of abstract {{JsonSink}} and 
{{JsonSource}} classes which extend the existing abstract {{FileBasedSink}} and 
{{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. The {{PTransform}} 
the user will end up using here are {{Write}} and {{Read}} from the {{io}} 
package.

Example:
{code}
Write.to(JacksonSink.of(...)) // for writing
Read.from(JacksonSource.of(...)) // for reading
{code}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:16 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
{{JsonIO}} should be the enclosing class of abstract {{JsonSink}} and 
{{JsonSource}} classes which extend the existing abstract {{FileBasedSink}} and 
{{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
{{JsonIO}} should be the enclosing class of abstract {{JsonSink}} and 
{{JsonSource}} classes which extend the existing abstract {{FileBasedSink}} and 
{{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. The {{PTransform}} 
the user will end up using here are {{Write}} and {{Read}} from the {{io}} 
package.

Example:
{code}
Write.to(JacksonSink.of(...)) // for writing
Read.from(JacksonSource.of(...)) // for reading
{code}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1704) Create.TimestampedValues should take a TypeDescriptor as an alternative to explicitly specifying the Coder

2017-03-15 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1704.
-
Resolution: Done

> Create.TimestampedValues should take a TypeDescriptor as an alternative to 
> explicitly specifying the Coder
> --
>
> Key: BEAM-1704
> URL: https://issues.apache.org/jira/browse/BEAM-1704
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> https://issues.apache.org/jira/browse/BEAM-1446 added {{TypeDescriptor}} to 
> {{Create.Values}} builder.
> This should also be added to {{Create.TimestampedValues}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:18 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
We'll need {{JsonSink}} and {{JsonSource}} classes which extend the existing 
abstract {{FileBasedSink}} and {{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
{{JsonIO}} should be the enclosing class of abstract {{JsonSink}} and 
{{JsonSource}} classes which extend the existing abstract {{FileBasedSink}} and 
{{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:24 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user.

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from object to JSON string (since there are several 
ways to implement this).

Should these transforms be in a separate class such as JacksonIO?All common 
JSON file logic regarding how the file should be constructed (As [~jkff] 
mentioned this should be better defined) will be in the abstract sink and 
source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I agree that {{JsonIO}} itself should not be abstract and should not implement 
any interface (same as other {{XxxIO}} classes that exist today).
It will be the enclosing class of {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. These 
will not be exposed to the user.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:24 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user. All common JSON file logic 
regarding how the file should be constructed (As [~jkff] mentioned this should 
be better defined) will be in these sink and source, including all file 
writing/reading related code (Inherited from {{FileBasedSink}} and 
{{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from object to JSON string (since there are several 
ways to implement this).

Should these transforms be in a separate class such as JacksonIO?

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user.

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from object to JSON string (since there are several 
ways to implement this).

Should these transforms be in a separate class such as JacksonIO?All common 
JSON file logic regarding how the file should be constructed (As [~jkff] 
mentioned this should be better defined) will be in the abstract sink and 
source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1581) JSON source and sink

2017-03-15 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1581:

Summary: JSON source and sink  (was: JSON sources and sinks)

> JSON source and sink
> 
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1704) Create.TimestampedValues should take a TypeDescriptor as an alternative to explicitly specifying the Coder

2017-03-15 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1704:

Fix Version/s: First stable release

> Create.TimestampedValues should take a TypeDescriptor as an alternative to 
> explicitly specifying the Coder
> --
>
> Key: BEAM-1704
> URL: https://issues.apache.org/jira/browse/BEAM-1704
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> https://issues.apache.org/jira/browse/BEAM-1446 added {{TypeDescriptor}} to 
> {{Create.Values}} builder.
> This should also be added to {{Create.TimestampedValues}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:19 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I agree that {{JsonIO}} itself should not be abstract and should not implement 
any interface (same as other {{XxxIO}} classes that exist today).
It will be the enclosing class of {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. These 
will not be exposed to the user.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
We'll need {{JsonSink}} and {{JsonSource}} classes which extend the existing 
abstract {{FileBasedSink}} and {{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:26 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user. All common JSON file logic 
regarding how the file should be constructed (As [~jkff] mentioned this should 
be better defined) will be in these sink and source, including all file 
writing/reading related code (Inherited from {{FileBasedSink}} and 
{{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from objects to JSON Strings (since there are several 
ways to implement this).

Should these transforms be in a separate class such as {{JacksonIO}}?


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user. All common JSON file logic 
regarding how the file should be constructed (As [~jkff] mentioned this should 
be better defined) will be in these sink and source, including all file 
writing/reading related code (Inherited from {{FileBasedSink}} and 
{{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from object to JSON string (since there are several 
ways to implement this).

Should these transforms be in a separate class such as JacksonIO?

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:32 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink extends FileBasedSink}} and 
{{JsonSource  extends FileBasedSource}} classes. But these should not 
be used directly by the user. All common JSON file logic regarding how the file 
should be constructed (As [~jkff] mentioned this should be better defined) will 
be in these sink and source, including all file writing/reading related code 
(Inherited from {{FileBasedSink}} and {{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from objects to JSON Strings (since there are several 
ways to implement this).

Should these transforms be in a separate class such as {{JacksonIO}} (Similar 
to {{AvroIO}})?


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink extends FileBasedSink}} and 
{{JsonSource  extends FileBasedSource}} classes. But these should not 
be used directly by the user. All common JSON file logic regarding how the file 
should be constructed (As [~jkff] mentioned this should be better defined) will 
be in these sink and source, including all file writing/reading related code 
(Inherited from {{FileBasedSink}} and {{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from objects to JSON Strings (since there are several 
ways to implement this).

Should these transforms be in a separate class such as {{JacksonIO}}?

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1726) TestFlinkRunner should assert PAssert success/failure counters

2017-03-15 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1726:
---

 Summary: TestFlinkRunner should assert PAssert success/failure 
counters
 Key: BEAM-1726
 URL: https://issues.apache.org/jira/browse/BEAM-1726
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Aviem Zur
Assignee: Aljoscha Krettek


TestFlinkRunner should assert the value of {{PAssert}}'s success/failure 
aggregators. i.e. assert that:

PAssert.SUCCESS_COUNTER value = number of assertions defined in  the pipeline
and
PAssert.FAILURE_COUNTER value = 0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1726) TestFlinkRunner should assert PAssert success/failure counters

2017-03-15 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1726:

Description: 
{{TestFlinkRunner}} should assert the value of {{PAssert}}'s success/failure 
aggregators. i.e. assert that:

{code}
 == [number of assertions defined in  the 
pipeline]
&&
 == 0
{code}

  was:
TestFlinkRunner should assert the value of {{PAssert}}'s success/failure 
aggregators. i.e. assert that:

PAssert.SUCCESS_COUNTER value = number of assertions defined in  the pipeline
and
PAssert.FAILURE_COUNTER value = 0


> TestFlinkRunner should assert PAssert success/failure counters
> --
>
> Key: BEAM-1726
> URL: https://issues.apache.org/jira/browse/BEAM-1726
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Aviem Zur
>Assignee: Aljoscha Krettek
>
> {{TestFlinkRunner}} should assert the value of {{PAssert}}'s success/failure 
> aggregators. i.e. assert that:
> {code}
>  == [number of assertions defined in  the 
> pipeline]
> &&
>  == 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1482) Support SetState in Spark runner

2017-04-09 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1482:
---

Assignee: Aviem Zur

> Support SetState in Spark runner
> 
>
> Key: BEAM-1482
> URL: https://issues.apache.org/jira/browse/BEAM-1482
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Assignee: Aviem Zur
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1477) Support MapState in Spark runner

2017-04-09 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1477:
---

Assignee: Aviem Zur

> Support MapState in Spark runner
> 
>
> Key: BEAM-1477
> URL: https://issues.apache.org/jira/browse/BEAM-1477
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Assignee: Aviem Zur
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1035) Support for new State API in SparkRunner

2017-04-09 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1035:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Support for new State API in SparkRunner
> 
>
> Key: BEAM-1035
> URL: https://issues.apache.org/jira/browse/BEAM-1035
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Assignee: Aviem Zur
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1692) Convert to Codahale Metric interfaces where possible in WithMetricsSupport MetricRegistry

2017-04-09 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1692:
---

Assignee: (was: Aviem Zur)

> Convert to Codahale Metric interfaces where possible in WithMetricsSupport 
> MetricRegistry
> -
>
> Key: BEAM-1692
> URL: https://issues.apache.org/jira/browse/BEAM-1692
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Priority: Minor
>
> Currently in Spark runner all Beam metrics are reported by Codahale metrics 
> as Codahale {{Gauge}} metrics.
> Some of the metrics are formatted to look like their counterparts in 
> Codahale's {{GraphiteReporter}} but this is a hack and should be avoided if 
> possible.
> Instead, convert Beam metrics to their Codahale counterparts in 
> {{WithMetricsSupport}} getters where possible.
> Since Beam metrics are not 100% compatible with Codahale metrics, consider 
> where to make tradeoffs and stay with a simple {{Gauge}}.
> Also, rename {{WithMetricsSupport}} something like {{SparkBeamMetricRegistry}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-848) Shuffle input read-values to get maximum parallelism.

2017-04-03 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953748#comment-15953748
 ] 

Aviem Zur commented on BEAM-848:


Now that BEAM-1074 is resolved a user can improve parallelism simply by setting 
the {{spark.default.parallelism}} configuration.
[~amitsela] I believe we can close this ticket.

> Shuffle input read-values to get maximum parallelism.
> -
>
> Key: BEAM-848
> URL: https://issues.apache.org/jira/browse/BEAM-848
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>
> It would be wise to shuffle the read values _after_ flatmap to increase 
> parallelism in processing of the data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1581) JSON source and sink

2017-04-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1581:

Description: 
JSON source and sink to read/write JSON files.
Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}

The {{PCollection}} of objects the user passes to the transform should be 
embedded in a valid JSON file
The most common pattern for this is a large object with an array member which 
holds all the data objects and other members for metadata.
Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
Another pattern used in integration is a file which is simply a JSON array of 
objects.

  was:
JSON source and sink to read/write JSON files.
Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}

The {{PCollection}} of objects the user passes to the transform should be 
embedded in a valid JSON file, this normal pattern for this is a large objects 
with an array member which holds all the data objects and other members for 
metadata.
Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
Another pattern used in integration is a file which is simply a JSON array of 
objects.


> JSON source and sink
> 
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}
> The {{PCollection}} of objects the user passes to the transform should be 
> embedded in a valid JSON file
> The most common pattern for this is a large object with an array member which 
> holds all the data objects and other members for metadata.
> Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
> Another pattern used in integration is a file which is simply a JSON array of 
> objects.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1581) JSON source and sink

2017-04-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1581:

Description: 
JSON source and sink to read/write JSON files.
Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}

The {{PCollection}} of objects the user passes to the transform should be 
embedded in a valid JSON file, this normal pattern for this is a large objects 
with an array member which holds all the data objects and other members for 
metadata.
Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
Another pattern used in integration is a file which is simply a JSON array of 
objects.

  was:
JSON source and sink to read/write JSON files.
Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}

The {{PCollection}} of objects the user passes to the transform should be 
embedded in a valid JSON file, this normal pattern for this is a large objects 
with an array member which holds all the data objects and other members for 
metadata.
Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
Another common pattern used in integration is a file which is simply a JSON 
array of objects.


> JSON source and sink
> 
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}
> The {{PCollection}} of objects the user passes to the transform should be 
> embedded in a valid JSON file, this normal pattern for this is a large 
> objects with an array member which holds all the data objects and other 
> members for metadata.
> Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
> Another pattern used in integration is a file which is simply a JSON array of 
> objects.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1581) JSON source and sink

2017-04-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1581:

Description: 
JSON source and sink to read/write JSON files.
Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}

The {{PCollection}} of objects the user passes to the transform should be 
embedded in a valid JSON file, this normal pattern for this is a large objects 
with an array member which holds all the data objects and other members for 
metadata.
Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
Another common pattern used in integration is a file which is simply a JSON 
array of objects.

  was:
JSON source and sink to read/write JSON files.
Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}


> JSON source and sink
> 
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}
> The {{PCollection}} of objects the user passes to the transform should be 
> embedded in a valid JSON file, this normal pattern for this is a large 
> objects with an array member which holds all the data objects and other 
> members for metadata.
> Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
> Another common pattern used in integration is a file which is simply a JSON 
> array of objects.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1858) improve error message when Create.of() is called with an empty iterator

2017-04-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1858:
---

Assignee: Davor Bonaci  (was: Frances Perry)

> improve error message when Create.of() is called with an empty iterator
> ---
>
> Key: BEAM-1858
> URL: https://issues.apache.org/jira/browse/BEAM-1858
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Wesley Tanaka
>Assignee: Davor Bonaci
>
> The current error copy states:
> "java.lang.IllegalArgumentException: Elements must be provided to construct 
> the default Create Coder. To Create an empty PCollection, either call 
> Create.empty(Coder), or call 'withCoder(Coder)' on the result PTransform"
> This is potentially confusing for two reasons:
> 1. "the default Create Coder" assumes a high level of knowledge of how Create 
> class works
> 2. since "Create" is a common word, it may not be immediately clear that 
> Create is referring to org.apache.beam.sdk.transforms.Create as opposed to 
> the possibility that there might be a compound noun in Beam model called 
> "Create Coder"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1858) improve error message when Create.of() is called with an empty iterator

2017-04-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1858:

Priority: Trivial  (was: Major)

> improve error message when Create.of() is called with an empty iterator
> ---
>
> Key: BEAM-1858
> URL: https://issues.apache.org/jira/browse/BEAM-1858
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Wesley Tanaka
>Assignee: Davor Bonaci
>Priority: Trivial
>
> The current error copy states:
> "java.lang.IllegalArgumentException: Elements must be provided to construct 
> the default Create Coder. To Create an empty PCollection, either call 
> Create.empty(Coder), or call 'withCoder(Coder)' on the result PTransform"
> This is potentially confusing for two reasons:
> 1. "the default Create Coder" assumes a high level of knowledge of how Create 
> class works
> 2. since "Create" is a common word, it may not be immediately clear that 
> Create is referring to org.apache.beam.sdk.transforms.Create as opposed to 
> the possibility that there might be a compound noun in Beam model called 
> "Create Coder"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1858) improve error message when Create.of() is called with an empty iterator

2017-04-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1858:

Component/s: (was: beam-model)
 sdk-java-core

> improve error message when Create.of() is called with an empty iterator
> ---
>
> Key: BEAM-1858
> URL: https://issues.apache.org/jira/browse/BEAM-1858
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Wesley Tanaka
>Assignee: Frances Perry
>
> The current error copy states:
> "java.lang.IllegalArgumentException: Elements must be provided to construct 
> the default Create Coder. To Create an empty PCollection, either call 
> Create.empty(Coder), or call 'withCoder(Coder)' on the result PTransform"
> This is potentially confusing for two reasons:
> 1. "the default Create Coder" assumes a high level of knowledge of how Create 
> class works
> 2. since "Create" is a common word, it may not be immediately clear that 
> Create is referring to org.apache.beam.sdk.transforms.Create as opposed to 
> the possibility that there might be a compound noun in Beam model called 
> "Create Coder"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1581) JSON source and sink

2017-04-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1581:

Description: 
JSON source and sink to read/write JSON files.
Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}

The {{PCollection}} of objects the user passes to the transform should be 
embedded in a valid JSON file
The most common pattern for this is a large object with an array member which 
holds all the data objects and other members for metadata.
Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
Another pattern used is a file which is simply a JSON array of objects.

  was:
JSON source and sink to read/write JSON files.
Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}

The {{PCollection}} of objects the user passes to the transform should be 
embedded in a valid JSON file
The most common pattern for this is a large object with an array member which 
holds all the data objects and other members for metadata.
Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
Another pattern used in integration is a file which is simply a JSON array of 
objects.


> JSON source and sink
> 
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}
> The {{PCollection}} of objects the user passes to the transform should be 
> embedded in a valid JSON file
> The most common pattern for this is a large object with an array member which 
> holds all the data objects and other members for metadata.
> Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
> Another pattern used is a file which is simply a JSON array of objects.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1603) Enable programmatic execution of spark pipelines.

2017-04-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1603:
---

Assignee: (was: Aviem Zur)

> Enable programmatic execution of spark pipelines.
> -
>
> Key: BEAM-1603
> URL: https://issues.apache.org/jira/browse/BEAM-1603
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, testing
>Reporter: Jason Kuster
>
> In order to enable execution of Spark Integration Tests against a cluster, it 
> is necessary to have the ability to execute Spark pipelines via maven, rather 
> than spark-submit. The minimum necessary is to enable this in the 
> TestSparkRunner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1337) Use our coder infrastructure for coders for state

2017-04-01 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1337.
-
   Resolution: Fixed
Fix Version/s: First stable release

> Use our coder infrastructure for coders for state
> -
>
> Key: BEAM-1337
> URL: https://issues.apache.org/jira/browse/BEAM-1337
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Aviem Zur
>Priority: Minor
> Fix For: First stable release
>
>
> Today the user must explicitly provide any coders needed for serializing 
> state data. We'd rather use the coder registry and infer the coder.
> Currently all factory methods in {{StateSpecs}} take a coder argument. For 
> example:
> {code}
> StateSpecs.value(coderForT);
> {code}
> We could leverage the coder registry and provide different overloads:
> TypeDescriptor:
> {code}
> StateSpecs.value(typeDescriptorForT); 
> {code}
> Reflection:
> {code}
> StateSpec.value();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1726) TestFlinkRunner should assert PAssert success/failure counters

2017-04-12 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1726:
---

Assignee: Aljoscha Krettek  (was: Aviem Zur)

> TestFlinkRunner should assert PAssert success/failure counters
> --
>
> Key: BEAM-1726
> URL: https://issues.apache.org/jira/browse/BEAM-1726
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Aviem Zur
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: First stable release
>
>
> {{TestFlinkRunner}} should assert the value of {{PAssert}}'s success/failure 
> aggregators. i.e. assert that:
> {code}
>  == [number of assertions defined in  the 
> pipeline]
> &&
>  == 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1950) Access to static field shouldn't be protected by lock on non-static MicrobatchSource.this

2017-04-12 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1950:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Access to static field shouldn't be protected by lock on non-static 
> MicrobatchSource.this
> -
>
> Key: BEAM-1950
> URL: https://issues.apache.org/jira/browse/BEAM-1950
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ted Yu
>Assignee: Aviem Zur
>Priority: Minor
>
> In MicrobatchSource :
> {code}
>   private synchronized void initReaderCache(long readerCacheInterval) {
> if (readerCache == null) {
>   LOG.info("Creating reader cache. Cache interval = " + 
> readerCacheInterval + " ms.");
>   readerCache =
>   CacheBuilder.newBuilder()
> {code}
> readerCache is static.
> Access to readerCache shouldn't be protected by lock on non-static 
> MicrobatchSource.this



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1950) Access to static field shouldn't be protected by lock on non-static MicrobatchSource.this

2017-04-12 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966246#comment-15966246
 ] 

Aviem Zur commented on BEAM-1950:
-

Yes. {{initReaderCache}} should be marked static.

> Access to static field shouldn't be protected by lock on non-static 
> MicrobatchSource.this
> -
>
> Key: BEAM-1950
> URL: https://issues.apache.org/jira/browse/BEAM-1950
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ted Yu
>Assignee: Amit Sela
>Priority: Minor
>
> In MicrobatchSource :
> {code}
>   private synchronized void initReaderCache(long readerCacheInterval) {
> if (readerCache == null) {
>   LOG.info("Creating reader cache. Cache interval = " + 
> readerCacheInterval + " ms.");
>   readerCache =
>   CacheBuilder.newBuilder()
> {code}
> readerCache is static.
> Access to readerCache shouldn't be protected by lock on non-static 
> MicrobatchSource.this



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1336) A StateSpec that doesn't care about the key shouldn't be forced to declare it as type Object

2017-04-07 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1336:
---

Assignee: (was: Aviem Zur)

> A StateSpec that doesn't care about the key shouldn't be forced to declare it 
> as type Object
> 
>
> Key: BEAM-1336
> URL: https://issues.apache.org/jira/browse/BEAM-1336
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Minor
>  Labels: starter
>
> In the state API as it exists today, if (as is almost often the case) you are 
> writing a {{StateSpec}} other than a {{KeyedCombiningState}} the 
> key type is irrelevant and the user just has to write {{Object}} there. This 
> was carried over from {{StateTag}} and is an artifact of the visitor pattern 
> there and the difficulty of getting all the types to line up.
> I think simplifying the visitor pattern to be more of just a syntax traversal 
> might alleviate the issue and allow us to drop this noise from the syntax.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1294) Long running UnboundedSource Readers

2017-04-09 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1294.
-
   Resolution: Implemented
Fix Version/s: First stable release

> Long running UnboundedSource Readers
> 
>
> Key: BEAM-1294
> URL: https://issues.apache.org/jira/browse/BEAM-1294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> When reading from an UnboundedSource, current implementation will cause each 
> split to create a new Reader every micro-batch.
> As long as the overhead of creating a reader is relatively low, it's 
> reasonable (though I'd still be happy to get rid of), but in cases where the 
> creation overhead is large it becomes unreasonable forcing large batches.
> One way to solve this could be to create a pool of lazy-init readers to serve 
> each executor, maybe via Broadcast variables. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1957) Missing DoFn annotations documentation

2017-04-12 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1957:
---

 Summary: Missing DoFn annotations documentation
 Key: BEAM-1957
 URL: https://issues.apache.org/jira/browse/BEAM-1957
 Project: Beam
  Issue Type: Improvement
  Components: website
Reporter: Aviem Zur
Assignee: Davor Bonaci


Not all {{DoFn}} annotations are covered by the programming guide.
Only {{@ProcessElement}} is currently covered.
We should have documentation for the other (non-expermintal at least) 
annotations:
{code}
public @interface Setup
public @interface StartBundle
public @interface FinishBundle
public @interface Teardown
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1958) Standard IO Metrics in Java SDK

2017-04-12 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1958:
---

 Summary: Standard IO Metrics in Java SDK
 Key: BEAM-1958
 URL: https://issues.apache.org/jira/browse/BEAM-1958
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-java-core
Reporter: Aviem Zur
Assignee: Aviem Zur


Create class {{IOMetrics}} with factories to create standard IO metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1959) Standard IO Metrics in Python SDK

2017-04-12 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1959:
---

 Summary: Standard IO Metrics in Python SDK
 Key: BEAM-1959
 URL: https://issues.apache.org/jira/browse/BEAM-1959
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py
Reporter: Aviem Zur
Assignee: Ahmet Altay






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1950) Access to static field shouldn't be protected by lock on non-static MicrobatchSource.this

2017-04-12 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1950.
-
   Resolution: Fixed
Fix Version/s: First stable release

> Access to static field shouldn't be protected by lock on non-static 
> MicrobatchSource.this
> -
>
> Key: BEAM-1950
> URL: https://issues.apache.org/jira/browse/BEAM-1950
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ted Yu
>Assignee: Aviem Zur
>Priority: Minor
> Fix For: First stable release
>
>
> In MicrobatchSource :
> {code}
>   private synchronized void initReaderCache(long readerCacheInterval) {
> if (readerCache == null) {
>   LOG.info("Creating reader cache. Cache interval = " + 
> readerCacheInterval + " ms.");
>   readerCache =
>   CacheBuilder.newBuilder()
> {code}
> readerCache is static.
> Access to readerCache shouldn't be protected by lock on non-static 
> MicrobatchSource.this



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1950) Access to static field shouldn't be protected by lock on non-static MicrobatchSource.this

2017-04-12 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1950:

Fix Version/s: (was: First stable release)

> Access to static field shouldn't be protected by lock on non-static 
> MicrobatchSource.this
> -
>
> Key: BEAM-1950
> URL: https://issues.apache.org/jira/browse/BEAM-1950
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ted Yu
>Assignee: Aviem Zur
>Priority: Minor
>
> In MicrobatchSource :
> {code}
>   private synchronized void initReaderCache(long readerCacheInterval) {
> if (readerCache == null) {
>   LOG.info("Creating reader cache. Cache interval = " + 
> readerCacheInterval + " ms.");
>   readerCache =
>   CacheBuilder.newBuilder()
> {code}
> readerCache is static.
> Access to readerCache shouldn't be protected by lock on non-static 
> MicrobatchSource.this



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2038) PTransform identifier(name) documentation in Python SDK.

2017-04-20 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-2038:
---

 Summary: PTransform identifier(name) documentation in Python SDK.
 Key: BEAM-2038
 URL: https://issues.apache.org/jira/browse/BEAM-2038
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py
Reporter: Aviem Zur
Assignee: Ahmet Altay


PTransform identifier(name) documentation in Python SDK.

Specifically address this in address this in documentation on application of 
{{PTransform}}s in the pipeline and in code pertaining to metrics.

See parent issue BEAM-2035 for more information.
Take a look at the Java SDK subtask as well: BEAM-2037



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2038) PTransform identifier(name) documentation in Python SDK.

2017-04-20 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2038:

Description: 
PTransform identifier(name) documentation in Python SDK.

Specifically address this in in documentation of application of a 
{{PTransform}} in the pipeline and in code pertaining to metrics.

See parent issue BEAM-2035 for more information.
Take a look at the Java SDK subtask as well: BEAM-2037

  was:
PTransform identifier(name) documentation in Python SDK.

Specifically address this in address this in documentation on application of a 
{{PTransform}} in the pipeline and in code pertaining to metrics.

See parent issue BEAM-2035 for more information.
Take a look at the Java SDK subtask as well: BEAM-2037


> PTransform identifier(name) documentation in Python SDK.
> 
>
> Key: BEAM-2038
> URL: https://issues.apache.org/jira/browse/BEAM-2038
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py
>Reporter: Aviem Zur
>Assignee: Ahmet Altay
>
> PTransform identifier(name) documentation in Python SDK.
> Specifically address this in in documentation of application of a 
> {{PTransform}} in the pipeline and in code pertaining to metrics.
> See parent issue BEAM-2035 for more information.
> Take a look at the Java SDK subtask as well: BEAM-2037



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2037) PTransform identifier(name) documentation in Java SDK.

2017-04-20 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-2037:
---

 Summary: PTransform identifier(name) documentation in Java SDK.
 Key: BEAM-2037
 URL: https://issues.apache.org/jira/browse/BEAM-2037
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-java-core
Reporter: Aviem Zur
Assignee: Davor Bonaci


PTransform identifier(name) documentation in Java SDK.

Specifically address this in address this in Javadoc of {{Pipeline#apply}}, 
{{PBegin#apply}}, {{PCollection#apply}}, {{Metrics}}, {{MetricsResult}}, 
{{MetricsFilter}}.

Documentation around Metrics {{step}} (Which corresponds to the PTransform 
application identifier) in the classes mentioned above is a bit confusing 
currently.

See parent issue BEAM-2035 for more information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2036) PTransform identifier(name) documentation in website.

2017-04-20 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-2036:
---

 Summary: PTransform identifier(name) documentation in website.
 Key: BEAM-2036
 URL: https://issues.apache.org/jira/browse/BEAM-2036
 Project: Beam
  Issue Type: Sub-task
  Components: website
Reporter: Aviem Zur
Assignee: Davor Bonaci


PTransform identifier(name) documentation in website.
Section 'Applying transforms' - 
https://beam.apache.org/documentation/programming-guide/#transforms
See parent issue BEAM-2035 for more information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2036) PTransform identifier(name) documentation in website.

2017-04-20 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2036:

Description: 
PTransform identifier(name) documentation in the website.
Section 'Applying transforms' - 
https://beam.apache.org/documentation/programming-guide/#transforms
See parent issue BEAM-2035 for more information.

  was:
PTransform identifier(name) documentation in website.
Section 'Applying transforms' - 
https://beam.apache.org/documentation/programming-guide/#transforms
See parent issue BEAM-2035 for more information.


> PTransform identifier(name) documentation in website.
> -
>
> Key: BEAM-2036
> URL: https://issues.apache.org/jira/browse/BEAM-2036
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>
> PTransform identifier(name) documentation in the website.
> Section 'Applying transforms' - 
> https://beam.apache.org/documentation/programming-guide/#transforms
> See parent issue BEAM-2035 for more information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1581) JSON source and sink

2017-04-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1581:
---

Assignee: (was: Aviem Zur)

> JSON source and sink
> 
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}
> The {{PCollection}} of objects the user passes to the transform should be 
> embedded in a valid JSON file
> The most common pattern for this is a large object with an array member which 
> holds all the data objects and other members for metadata.
> Examples of public JSON APIs: https://www.sitepoint.com/10-example-json-files/
> Another pattern used is a file which is simply a JSON array of objects.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2035) PTransform application identifier(name) documentation.

2017-04-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2035:

Description: 
Add documentation around the user supplied/automatically created identifier 
(name) of the application of a {{PTransform}} within the pipeline.

Make sure to relate to how it is constructed when the application is within a 
composite transform.

Relate to how this name affects metrics aggregation and metrics querying.

  was:
Documentation around the user supplied/automatically created identifier (name) 
of the application of a {{PTransform}} within the pipeline.

Make sure to relate to how it is constructed when the application is within a 
composite transform.

Relate to how this name affects metrics aggregation and metrics querying.


> PTransform application identifier(name) documentation.
> --
>
> Key: BEAM-2035
> URL: https://issues.apache.org/jira/browse/BEAM-2035
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py, website
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>
> Add documentation around the user supplied/automatically created identifier 
> (name) of the application of a {{PTransform}} within the pipeline.
> Make sure to relate to how it is constructed when the application is within a 
> composite transform.
> Relate to how this name affects metrics aggregation and metrics querying.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2038) PTransform identifier(name) documentation in Python SDK.

2017-04-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2038:

Description: 
Add PTransform identifier(name) documentation in Python SDK.

Specifically address this in in documentation of application of a 
{{PTransform}} in the pipeline and in code pertaining to metrics.

See parent issue BEAM-2035 for more information.
Take a look at the Java SDK subtask as well: BEAM-2037. Ideas from that subtask 
can be applied here.

  was:
Add PTransform identifier(name) documentation in Python SDK.

Specifically address this in in documentation of application of a 
{{PTransform}} in the pipeline and in code pertaining to metrics.

See parent issue BEAM-2035 for more information.
Take a look at the Java SDK subtask as well: BEAM-2037


> PTransform identifier(name) documentation in Python SDK.
> 
>
> Key: BEAM-2038
> URL: https://issues.apache.org/jira/browse/BEAM-2038
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py
>Reporter: Aviem Zur
>Assignee: Ahmet Altay
>
> Add PTransform identifier(name) documentation in Python SDK.
> Specifically address this in in documentation of application of a 
> {{PTransform}} in the pipeline and in code pertaining to metrics.
> See parent issue BEAM-2035 for more information.
> Take a look at the Java SDK subtask as well: BEAM-2037. Ideas from that 
> subtask can be applied here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2037) PTransform identifier(name) documentation in Java SDK.

2017-04-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2037:

Description: 
Add/improve PTransform identifier(name) documentation in Java SDK.

Specifically address this in Javadoc of {{Pipeline#apply}}, {{PBegin#apply}}, 
{{PCollection#apply}}, {{Metrics}}, {{MetricsResult}}, {{MetricsFilter}}.

Documentation around Metrics {{step}} (Which corresponds to the PTransform 
application identifier) in the classes mentioned above is a bit confusing 
currently, this should be reworded in correlation with the changes 
{{Pipeline#apply}}, {{PBegin#apply}}, {{PCollection#apply}}.

See parent issue BEAM-2035 for more information.

  was:
Add PTransform identifier(name) documentation in Java SDK.

Specifically address this in Javadoc of {{Pipeline#apply}}, {{PBegin#apply}}, 
{{PCollection#apply}}, {{Metrics}}, {{MetricsResult}}, {{MetricsFilter}}.

Documentation around Metrics {{step}} (Which corresponds to the PTransform 
application identifier) in the classes mentioned above is a bit confusing 
currently, this should be reworded in correlation with the changes 
{{Pipeline#apply}}, {{PBegin#apply}}, {{PCollection#apply}}.

See parent issue BEAM-2035 for more information.


> PTransform identifier(name) documentation in Java SDK.
> --
>
> Key: BEAM-2037
> URL: https://issues.apache.org/jira/browse/BEAM-2037
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>
> Add/improve PTransform identifier(name) documentation in Java SDK.
> Specifically address this in Javadoc of {{Pipeline#apply}}, {{PBegin#apply}}, 
> {{PCollection#apply}}, {{Metrics}}, {{MetricsResult}}, {{MetricsFilter}}.
> Documentation around Metrics {{step}} (Which corresponds to the PTransform 
> application identifier) in the classes mentioned above is a bit confusing 
> currently, this should be reworded in correlation with the changes 
> {{Pipeline#apply}}, {{PBegin#apply}}, {{PCollection#apply}}.
> See parent issue BEAM-2035 for more information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2038) PTransform identifier(name) documentation in Python SDK.

2017-04-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2038:

Description: 
Add/improve PTransform identifier(name) documentation in Python SDK.

Specifically address this in in documentation of application of a 
{{PTransform}} in the pipeline and in code pertaining to metrics.

See parent issue BEAM-2035 for more information.
Take a look at the Java SDK subtask as well: BEAM-2037. Ideas from that subtask 
can be applied here.

  was:
Add PTransform identifier(name) documentation in Python SDK.

Specifically address this in in documentation of application of a 
{{PTransform}} in the pipeline and in code pertaining to metrics.

See parent issue BEAM-2035 for more information.
Take a look at the Java SDK subtask as well: BEAM-2037. Ideas from that subtask 
can be applied here.


> PTransform identifier(name) documentation in Python SDK.
> 
>
> Key: BEAM-2038
> URL: https://issues.apache.org/jira/browse/BEAM-2038
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py
>Reporter: Aviem Zur
>Assignee: Ahmet Altay
>
> Add/improve PTransform identifier(name) documentation in Python SDK.
> Specifically address this in in documentation of application of a 
> {{PTransform}} in the pipeline and in code pertaining to metrics.
> See parent issue BEAM-2035 for more information.
> Take a look at the Java SDK subtask as well: BEAM-2037. Ideas from that 
> subtask can be applied here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2038) PTransform identifier(name) documentation in Python SDK.

2017-04-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2038:

Description: 
Add PTransform identifier(name) documentation in Python SDK.

Specifically address this in in documentation of application of a 
{{PTransform}} in the pipeline and in code pertaining to metrics.

See parent issue BEAM-2035 for more information.
Take a look at the Java SDK subtask as well: BEAM-2037

  was:
PTransform identifier(name) documentation in Python SDK.

Specifically address this in in documentation of application of a 
{{PTransform}} in the pipeline and in code pertaining to metrics.

See parent issue BEAM-2035 for more information.
Take a look at the Java SDK subtask as well: BEAM-2037


> PTransform identifier(name) documentation in Python SDK.
> 
>
> Key: BEAM-2038
> URL: https://issues.apache.org/jira/browse/BEAM-2038
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py
>Reporter: Aviem Zur
>Assignee: Ahmet Altay
>
> Add PTransform identifier(name) documentation in Python SDK.
> Specifically address this in in documentation of application of a 
> {{PTransform}} in the pipeline and in code pertaining to metrics.
> See parent issue BEAM-2035 for more information.
> Take a look at the Java SDK subtask as well: BEAM-2037



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2036) PTransform identifier(name) documentation in website.

2017-04-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2036:

Description: 
Add PTransform identifier(name) documentation to the website.
Section 'Applying transforms' - 
https://beam.apache.org/documentation/programming-guide/#transforms
See parent issue BEAM-2035 for more information.

  was:
PTransform identifier(name) documentation in the website.
Section 'Applying transforms' - 
https://beam.apache.org/documentation/programming-guide/#transforms
See parent issue BEAM-2035 for more information.


> PTransform identifier(name) documentation in website.
> -
>
> Key: BEAM-2036
> URL: https://issues.apache.org/jira/browse/BEAM-2036
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>
> Add PTransform identifier(name) documentation to the website.
> Section 'Applying transforms' - 
> https://beam.apache.org/documentation/programming-guide/#transforms
> See parent issue BEAM-2035 for more information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2037) PTransform identifier(name) documentation in Java SDK.

2017-04-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2037:

Description: 
Add PTransform identifier(name) documentation in Java SDK.

Specifically address this in Javadoc of {{Pipeline#apply}}, {{PBegin#apply}}, 
{{PCollection#apply}}, {{Metrics}}, {{MetricsResult}}, {{MetricsFilter}}.

Documentation around Metrics {{step}} (Which corresponds to the PTransform 
application identifier) in the classes mentioned above is a bit confusing 
currently, this should be reworded in correlation with the changes 
{{Pipeline#apply}}, {{PBegin#apply}}, {{PCollection#apply}}.

See parent issue BEAM-2035 for more information.

  was:
PTransform identifier(name) documentation in Java SDK.

Specifically address this in Javadoc of {{Pipeline#apply}}, {{PBegin#apply}}, 
{{PCollection#apply}}, {{Metrics}}, {{MetricsResult}}, {{MetricsFilter}}.

Documentation around Metrics {{step}} (Which corresponds to the PTransform 
application identifier) in the classes mentioned above is a bit confusing 
currently.

See parent issue BEAM-2035 for more information.


> PTransform identifier(name) documentation in Java SDK.
> --
>
> Key: BEAM-2037
> URL: https://issues.apache.org/jira/browse/BEAM-2037
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>
> Add PTransform identifier(name) documentation in Java SDK.
> Specifically address this in Javadoc of {{Pipeline#apply}}, {{PBegin#apply}}, 
> {{PCollection#apply}}, {{Metrics}}, {{MetricsResult}}, {{MetricsFilter}}.
> Documentation around Metrics {{step}} (Which corresponds to the PTransform 
> application identifier) in the classes mentioned above is a bit confusing 
> currently, this should be reworded in correlation with the changes 
> {{Pipeline#apply}}, {{PBegin#apply}}, {{PCollection#apply}}.
> See parent issue BEAM-2035 for more information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1757) Improve log configuration when using multiple logging frameworks in Pipeline

2017-04-14 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1757:
---

Assignee: (was: Aviem Zur)

> Improve log configuration when using multiple logging frameworks in Pipeline
> 
>
> Key: BEAM-1757
> URL: https://issues.apache.org/jira/browse/BEAM-1757
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Ismaël Mejía
>Priority: Minor
>
> We need to improve / document how to configure the different logging 
> frameworks to support the different Beam modules (SDK/Runners/IOs).
> I have seen this reported already in slack, but it also bit me recently when 
> trying to log things with a module whose dependencies only supported java 
> based logging over slf4j (no log4j). I tried to provide the classical 
> log4j.properties but I could not get it to log anything, nothing from the 
> sdk, and logically nothing from the java logging framework.
> [Discussion on dev 
> list|https://lists.apache.org/thread.html/502fc4a0575534ddd5f6abffd0743de8d2ccc7fee078b5922c2e1300@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1975) Documentation around logging in different runners

2017-04-13 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1975:
---

 Summary: Documentation around logging in different runners
 Key: BEAM-1975
 URL: https://issues.apache.org/jira/browse/BEAM-1975
 Project: Beam
  Issue Type: Sub-task
  Components: website
Reporter: Aviem Zur
Assignee: Davor Bonaci






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1975) Documentation around logging in different runners

2017-04-14 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1975:

Description: 
Add documentation on how to configure logging in different runners, relate to 
SLF4J and bindings, and which binding is used in which runner.
Add helpful links to the different logging configuration guides for the 
bindings used in each runner.

> Documentation around logging in different runners
> -
>
> Key: BEAM-1975
> URL: https://issues.apache.org/jira/browse/BEAM-1975
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>
> Add documentation on how to configure logging in different runners, relate to 
> SLF4J and bindings, and which binding is used in which runner.
> Add helpful links to the different logging configuration guides for the 
> bindings used in each runner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1976) Allow only one runner profile active at once in examples and archetypes

2017-04-13 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1976:
---

 Summary: Allow only one runner profile active at once in examples 
and archetypes
 Key: BEAM-1976
 URL: https://issues.apache.org/jira/browse/BEAM-1976
 Project: Beam
  Issue Type: Sub-task
  Components: examples-java
Reporter: Aviem Zur
Assignee: Frances Perry


Since only one SLF4J logger binding is allowed in the classpath, we shouldn't 
allow more than one runner profile to be active at once in our 
examples/archetype modules since different runners use different bindings.
Also, remove slf4j-jdk14 dependency from root and place it instead in 
direct-runner and dataflow-runner profiles, for the same reason.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1974) Metrics documentation

2017-04-13 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1974:
---

 Summary: Metrics documentation
 Key: BEAM-1974
 URL: https://issues.apache.org/jira/browse/BEAM-1974
 Project: Beam
  Issue Type: Improvement
  Components: website
Reporter: Aviem Zur
Assignee: Davor Bonaci


Document metrics API and uses (make sure to remark that it is still 
experimental).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1757) Improve log configuration when using multiple logging frameworks in Pipeline

2017-04-13 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1757:

Description: 
We need to improve / document how to configure the different logging frameworks 
to support the different Beam modules (SDK/Runners/IOs).

I have seen this reported already in slack, but it also bit me recently when 
trying to log things with a module whose dependencies only supported java based 
logging over slf4j (no log4j). I tried to provide the classical 
log4j.properties but I could not get it to log anything, nothing from the sdk, 
and logically nothing from the java logging framework.

[Discussion on dev 
list|https://lists.apache.org/thread.html/502fc4a0575534ddd5f6abffd0743de8d2ccc7fee078b5922c2e1300@%3Cdev.beam.apache.org%3E]

  was:
We need to improve / document how to configure the different logging frameworks 
to support the different Beam modules (SDK/Runners/IOs).

I have seen this reported already in slack, but it also bit me recently when 
trying to log things with a module whose dependencies only supported java based 
logging over slf4j (no log4j). I tried to provide the classical 
log4j.properties but I could not get it to log anything, nothing from the sdk, 
and logically nothing from the java logging framework.


> Improve log configuration when using multiple logging frameworks in Pipeline
> 
>
> Key: BEAM-1757
> URL: https://issues.apache.org/jira/browse/BEAM-1757
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
>Priority: Minor
>
> We need to improve / document how to configure the different logging 
> frameworks to support the different Beam modules (SDK/Runners/IOs).
> I have seen this reported already in slack, but it also bit me recently when 
> trying to log things with a module whose dependencies only supported java 
> based logging over slf4j (no log4j). I tried to provide the classical 
> log4j.properties but I could not get it to log anything, nothing from the 
> sdk, and logically nothing from the java logging framework.
> [Discussion on dev 
> list|https://lists.apache.org/thread.html/502fc4a0575534ddd5f6abffd0743de8d2ccc7fee078b5922c2e1300@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1757) Improve log configuration when using multiple logging frameworks in Pipeline

2017-04-13 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1757:
---

Assignee: (was: Aviem Zur)

> Improve log configuration when using multiple logging frameworks in Pipeline
> 
>
> Key: BEAM-1757
> URL: https://issues.apache.org/jira/browse/BEAM-1757
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Ismaël Mejía
>Priority: Minor
>
> We need to improve / document how to configure the different logging 
> frameworks to support the different Beam modules (SDK/Runners/IOs).
> I have seen this reported already in slack, but it also bit me recently when 
> trying to log things with a module whose dependencies only supported java 
> based logging over slf4j (no log4j). I tried to provide the classical 
> log4j.properties but I could not get it to log anything, nothing from the 
> sdk, and logically nothing from the java logging framework.
> [Discussion on dev 
> list|https://lists.apache.org/thread.html/502fc4a0575534ddd5f6abffd0743de8d2ccc7fee078b5922c2e1300@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1802) Spark Runner does not shutdown correctly when executing multiple pipelines in sequence

2017-04-14 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969101#comment-15969101
 ] 

Aviem Zur commented on BEAM-1802:
-

Yes, it is working as designed, there is an inconsistency regarding the 
semantics of pipeline termination between the runners, this should be 
addressed. There is an [ongoing 
thread|https://lists.apache.org/thread.html/86831496a08fe148e3b982cdb904f828f262c0b571543a9fed7b915d@%3Cdev.beam.apache.org%3E]
 in the dev list regarding this.

> Spark Runner does not shutdown correctly when executing multiple pipelines in 
> sequence
> --
>
> Key: BEAM-1802
> URL: https://issues.apache.org/jira/browse/BEAM-1802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> I found this while running the Nexmark queries in sequence in local mode. I 
> had the correct configuration but it didn't seem to work.
> 17/03/24 12:07:49 WARN org.apache.spark.SparkContext: Multiple running 
> SparkContexts detected in the same JVM!
> org.apache.spark.SparkException: Only one SparkContext may be running in this 
> JVM (see SPARK-2243). To ignore this error, set 
> spark.driver.allowMultipleContexts = true. The currently running SparkContext 
> was created at:
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:100)
> org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:69)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:206)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:91)
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:266)
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1233)
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1961) Standard IO metrics documentation

2017-04-13 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1961:
---

 Summary: Standard IO metrics documentation
 Key: BEAM-1961
 URL: https://issues.apache.org/jira/browse/BEAM-1961
 Project: Beam
  Issue Type: Sub-task
  Components: website
Reporter: Aviem Zur
Assignee: Davor Bonaci






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

2017-04-19 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975193#comment-15975193
 ] 

Aviem Zur commented on BEAM-2005:
-

I'll refine - it should be in a separate module, but I think core should depend 
on it (i.e. out of the box functionality).

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> ---
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

2017-04-19 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975196#comment-15975196
 ] 

Aviem Zur commented on BEAM-2005:
-

Wouldn't this ticket mean actually implementing 
https://github.com/apache/beam/blob/master/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java
 ?

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> ---
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

2017-04-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2005:

Comment: was deleted

(was: I'll refine - it should be in a separate module, but I think core should 
depend on it (i.e. out of the box functionality).)

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> ---
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1984) Enable dependency analysis of non-compile dependencies

2017-04-16 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970425#comment-15970425
 ] 

Aviem Zur commented on BEAM-1984:
-

Another issue with the blanket {{ignoreNonCompile}} is we could have actually 
unused dependencies silently hidden by this. A more accurate way is to remove 
this and exclude specific dependencies in the relevant modules which need them 
at certain scopes even though they are not used in compile time. This can be 
achieved using 
https://maven.apache.org/plugins/maven-dependency-plugin/analyze-mojo.html#ignoredUnusedDeclaredDependencies

> Enable dependency analysis of non-compile dependencies
> --
>
> Key: BEAM-1984
> URL: https://issues.apache.org/jira/browse/BEAM-1984
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> Currently the dependency analysis is ignoring test dependencies, however if 
> we run:
> mvn install -Dmaven.test.skip=true
> It complains on multiple modules on dependencies that should be scoped 
> properly into the test mode but aren’t currently.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1726) Verify PAssert execution in TestFlinkRunner

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1726:

Summary: Verify PAssert execution in TestFlinkRunner  (was: TestFlinkRunner 
should assert PAssert success/failure counters)

> Verify PAssert execution in TestFlinkRunner
> ---
>
> Key: BEAM-1726
> URL: https://issues.apache.org/jira/browse/BEAM-1726
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Aviem Zur
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: First stable release
>
>
> {{TestFlinkRunner}} should assert the value of {{PAssert}}'s success/failure 
> aggregators. i.e. assert that:
> {code}
>  == [number of assertions defined in  the 
> pipeline]
> &&
>  == 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2002) Verify PAssert execution in TestSparkRunner

2017-04-18 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-2002:
---

 Summary: Verify PAssert execution in TestSparkRunner
 Key: BEAM-2002
 URL: https://issues.apache.org/jira/browse/BEAM-2002
 Project: Beam
  Issue Type: Sub-task
  Components: runner-spark
Reporter: Aviem Zur
Assignee: Amit Sela
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2003) Verify PAssert execution in TestDataflowRunner

2017-04-18 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-2003:
---

 Summary: Verify PAssert execution in TestDataflowRunner
 Key: BEAM-2003
 URL: https://issues.apache.org/jira/browse/BEAM-2003
 Project: Beam
  Issue Type: Sub-task
  Components: runner-dataflow
Reporter: Aviem Zur
Assignee: Daniel Halperin
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2002) Verify PAssert execution in TestSparkRunner

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-2002:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Verify PAssert execution in TestSparkRunner
> ---
>
> Key: BEAM-2002
> URL: https://issues.apache.org/jira/browse/BEAM-2002
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2001) Verify PAssert execution in all runners

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2001:

Description: 
Currently, {{PAssert}} assertions may not happen and tests will pass while 
silently hiding issues.

For {{ValidatesRunner}} tests to truly validate a runner supports the model we 
need to verify the {{PAssert}} assertions actually ran.

See [dev list 
discussion|https://lists.apache.org/thread.html/9e6b9e6a21d2a657a1dd293b8cc8497c76a8a66fa3a1358733c02101@%3Cdev.beam.apache.org%3E].

In order to reduce duplication, for runners which support metrics we could 
verify this in {{TestPipeline}}, removing the need for the runner itself to 
make this assertion (See https://issues.apache.org/jira/browse/BEAM-1763).

  was:
Currently, {{PAssert}} assertions may not happen and tests will pass while 
silently hiding issues.

For {{ValidatesRunner}} tests to truly validate a runner supports the model we 
need to verify the {{PAssert}} assertions actually ran.

See [dev list 
discussion|https://lists.apache.org/thread.html/9e6b9e6a21d2a657a1dd293b8cc8497c76a8a66fa3a1358733c02101@%3Cdev.beam.apache.org%3E].

In order to reduce duplication, for runners which support metrics we could 
verify this in {{TestPipeline}}, removing the need for the runner itself to 
make this assertion.


> Verify PAssert execution in all runners
> ---
>
> Key: BEAM-2001
> URL: https://issues.apache.org/jira/browse/BEAM-2001
> Project: Beam
>  Issue Type: Test
>  Components: runner-core
>Reporter: Aviem Zur
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> Currently, {{PAssert}} assertions may not happen and tests will pass while 
> silently hiding issues.
> For {{ValidatesRunner}} tests to truly validate a runner supports the model 
> we need to verify the {{PAssert}} assertions actually ran.
> See [dev list 
> discussion|https://lists.apache.org/thread.html/9e6b9e6a21d2a657a1dd293b8cc8497c76a8a66fa3a1358733c02101@%3Cdev.beam.apache.org%3E].
> In order to reduce duplication, for runners which support metrics we could 
> verify this in {{TestPipeline}}, removing the need for the runner itself to 
> make this assertion (See https://issues.apache.org/jira/browse/BEAM-1763).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1763) TestPipeline should ensure that all assertions succeeded

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1763:

Fix Version/s: (was: First stable release)

> TestPipeline should ensure that all assertions succeeded
> 
>
> Key: BEAM-1763
> URL: https://issues.apache.org/jira/browse/BEAM-1763
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Aviem Zur
>
> This doesn't need to be part of each {{PipelineRunner}} implementation if it 
> goes through the {{PipelineResult}} APIs. The assertion can be of the form 
> that if the Pipeline is finished, then the number of successful assertions is 
> equal to the total number of assertions.
> Suggested solution:
> For runners which support metrics, use the counters for successful/failed 
> assertions and compare them to expected number of assertions.
> For runners which do not support metrics consider implementing a fallback 
> solution:
> {{PAssert}} will write to temp files, which will be placed in the test's 
> {{tempLocation}} ({{file}} in local mode, {{gs/hdfs}} in cluster mode). These 
> temp files' names will include a UUID generated for each test, this UUID will 
> be accessible to {{PAssert}} or the text sink.
> Successful assertion count and failed assertion count + exceptions will be 
> written to text files which will then be accessible to {{TestPipeline}} to 
> assert on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1763) TestPipeline should ensure that all assertions succeeded

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1763:
---

Assignee: Aviem Zur

> TestPipeline should ensure that all assertions succeeded
> 
>
> Key: BEAM-1763
> URL: https://issues.apache.org/jira/browse/BEAM-1763
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Aviem Zur
>
> This doesn't need to be part of each {{PipelineRunner}} implementation if it 
> goes through the {{PipelineResult}} APIs. The assertion can be of the form 
> that if the Pipeline is finished, then the number of successful assertions is 
> equal to the total number of assertions.
> Suggested solution:
> For runners which support metrics, use the counters for successful/failed 
> assertions and compare them to expected number of assertions.
> For runners which do not support metrics consider implementing a fallback 
> solution:
> {{PAssert}} will write to temp files, which will be placed in the test's 
> {{tempLocation}} ({{file}} in local mode, {{gs/hdfs}} in cluster mode). These 
> temp files' names will include a UUID generated for each test, this UUID will 
> be accessible to {{PAssert}} or the text sink.
> Successful assertion count and failed assertion count + exceptions will be 
> written to text files which will then be accessible to {{TestPipeline}} to 
> assert on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1591) Implement Combine optimizations for GABW in streaming.

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1591:
---

Assignee: Kobi Salant  (was: Aviem Zur)

> Implement Combine optimizations for GABW in streaming.
> --
>
> Key: BEAM-1591
> URL: https://issues.apache.org/jira/browse/BEAM-1591
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Kobi Salant
>
> This should be straight-forward.
> Introduce {{AccumT}} generics in {{SparkGroupAlsoByWindowViaWindowSet}} and 
> call with {{InputT}} for GBK and {{AccumT}} with Combine.
> Pass the proper {{SystemReduceFn}} instead of creating it in 
> {{SparkGroupAlsoByWindowViaWindowSet}}.
> For combine, extract the output from the fired accumulated output. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2004) Verify PAssert execution in TestApexRunner

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-2004:
---

Assignee: Thomas Weise

> Verify PAssert execution in TestApexRunner
> --
>
> Key: BEAM-2004
> URL: https://issues.apache.org/jira/browse/BEAM-2004
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-apex
>Reporter: Aviem Zur
>Assignee: Thomas Weise
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2004) Verify PAssert execution in TestApexRunner

2017-04-18 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-2004:
---

 Summary: Verify PAssert execution in TestApexRunner
 Key: BEAM-2004
 URL: https://issues.apache.org/jira/browse/BEAM-2004
 Project: Beam
  Issue Type: Sub-task
  Components: runner-apex
Reporter: Aviem Zur
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2002) Verify PAssert execution in TestSparkRunner

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2002:

Fix Version/s: First stable release

> Verify PAssert execution in TestSparkRunner
> ---
>
> Key: BEAM-2002
> URL: https://issues.apache.org/jira/browse/BEAM-2002
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Blocker
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2004) Verify PAssert execution in TestApexRunner

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2004:

Fix Version/s: First stable release

> Verify PAssert execution in TestApexRunner
> --
>
> Key: BEAM-2004
> URL: https://issues.apache.org/jira/browse/BEAM-2004
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-apex
>Reporter: Aviem Zur
>Assignee: Thomas Weise
>Priority: Blocker
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2003) Verify PAssert execution in TestDataflowRunner

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2003:

Fix Version/s: First stable release

> Verify PAssert execution in TestDataflowRunner
> --
>
> Key: BEAM-2003
> URL: https://issues.apache.org/jira/browse/BEAM-2003
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Blocker
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2001) Verify PAssert execution in all runners

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2001:

Description: 
Currently, {{PAssert}} assertions may not happen and tests will pass while 
silently hiding issues.

For {{ValidatesRunner}} tests to truly validate a runner supports the model we 
need to verify the {{PAssert}} assertions actually ran.

See [dev list 
discussion|https://lists.apache.org/thread.html/9e6b9e6a21d2a657a1dd293b8cc8497c76a8a66fa3a1358733c02101@%3Cdev.beam.apache.org%3E].

In order to reduce duplication, for runners which support metrics we could 
verify this in {{TestPipeline}}, removing the need for the runner itself to 
make this assertion.

  was:
Currently, {{PAssert}} assertions may not happen and tests will pass while 
silently hiding issues.

For {{ValidatesRunner}} tests to truly validate a runner supports the model we 
need to verify the {{PAssert}}s actually ran.

See [dev list 
discussion|https://lists.apache.org/thread.html/9e6b9e6a21d2a657a1dd293b8cc8497c76a8a66fa3a1358733c02101@%3Cdev.beam.apache.org%3E].

In order to reduce duplication, for runners which support metrics we could 
verify this in {{TestPipeline}}, removing the need for the runner itself to 
make this assertion.


> Verify PAssert execution in all runners
> ---
>
> Key: BEAM-2001
> URL: https://issues.apache.org/jira/browse/BEAM-2001
> Project: Beam
>  Issue Type: Test
>  Components: runner-core
>Reporter: Aviem Zur
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> Currently, {{PAssert}} assertions may not happen and tests will pass while 
> silently hiding issues.
> For {{ValidatesRunner}} tests to truly validate a runner supports the model 
> we need to verify the {{PAssert}} assertions actually ran.
> See [dev list 
> discussion|https://lists.apache.org/thread.html/9e6b9e6a21d2a657a1dd293b8cc8497c76a8a66fa3a1358733c02101@%3Cdev.beam.apache.org%3E].
> In order to reduce duplication, for runners which support metrics we could 
> verify this in {{TestPipeline}}, removing the need for the runner itself to 
> make this assertion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1726) TestFlinkRunner should assert PAssert success/failure counters

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1726:

Issue Type: Sub-task  (was: Improvement)
Parent: BEAM-2001

> TestFlinkRunner should assert PAssert success/failure counters
> --
>
> Key: BEAM-1726
> URL: https://issues.apache.org/jira/browse/BEAM-1726
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Aviem Zur
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: First stable release
>
>
> {{TestFlinkRunner}} should assert the value of {{PAssert}}'s success/failure 
> aggregators. i.e. assert that:
> {code}
>  == [number of assertions defined in  the 
> pipeline]
> &&
>  == 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1726) Verify PAssert execution in TestFlinkRunner

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1726:

Description: (was: {{TestFlinkRunner}} should assert the value of 
{{PAssert}}'s success/failure aggregators. i.e. assert that:

{code}
 == [number of assertions defined in  the 
pipeline]
&&
 == 0
{code})

> Verify PAssert execution in TestFlinkRunner
> ---
>
> Key: BEAM-1726
> URL: https://issues.apache.org/jira/browse/BEAM-1726
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Aviem Zur
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1763) TestPipeline should ensure that all assertions succeeded

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1763:

Issue Type: Sub-task  (was: Improvement)
Parent: BEAM-2001

> TestPipeline should ensure that all assertions succeeded
> 
>
> Key: BEAM-1763
> URL: https://issues.apache.org/jira/browse/BEAM-1763
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Priority: Blocker
> Fix For: First stable release
>
>
> This doesn't need to be part of each {{PipelineRunner}} implementation if it 
> goes through the {{PipelineResult}} APIs. The assertion can be of the form 
> that if the Pipeline is finished, then the number of successful assertions is 
> equal to the total number of assertions.
> Suggested solution:
> For runners which support metrics, use the counters for successful/failed 
> assertions and compare them to expected number of assertions.
> For runners which do not support metrics consider implementing a fallback 
> solution:
> {{PAssert}} will write to temp files, which will be placed in the test's 
> {{tempLocation}} ({{file}} in local mode, {{gs/hdfs}} in cluster mode). These 
> temp files' names will include a UUID generated for each test, this UUID will 
> be accessible to {{PAssert}} or the text sink.
> Successful assertion count and failed assertion count + exceptions will be 
> written to text files which will then be accessible to {{TestPipeline}} to 
> assert on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1763) TestPipeline should ensure that all assertions succeeded

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1763:

Priority: Major  (was: Blocker)

> TestPipeline should ensure that all assertions succeeded
> 
>
> Key: BEAM-1763
> URL: https://issues.apache.org/jira/browse/BEAM-1763
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Thomas Groh
> Fix For: First stable release
>
>
> This doesn't need to be part of each {{PipelineRunner}} implementation if it 
> goes through the {{PipelineResult}} APIs. The assertion can be of the form 
> that if the Pipeline is finished, then the number of successful assertions is 
> equal to the total number of assertions.
> Suggested solution:
> For runners which support metrics, use the counters for successful/failed 
> assertions and compare them to expected number of assertions.
> For runners which do not support metrics consider implementing a fallback 
> solution:
> {{PAssert}} will write to temp files, which will be placed in the test's 
> {{tempLocation}} ({{file}} in local mode, {{gs/hdfs}} in cluster mode). These 
> temp files' names will include a UUID generated for each test, this UUID will 
> be accessible to {{PAssert}} or the text sink.
> Successful assertion count and failed assertion count + exceptions will be 
> written to text files which will then be accessible to {{TestPipeline}} to 
> assert on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1763) TestPipeline should ensure that all assertions succeeded

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1763:

Description: 
This doesn't need to be part of each {{PipelineRunner}} implementation if it 
goes through the {{PipelineResult}} APIs. The assertion can be of the form that 
if the Pipeline is finished, then the number of successful assertions is equal 
to the total number of assertions.

Suggested solution:

For runners which support metrics, use the counters for successful/failed 
assertions and compare them to expected number of assertions.

For runners which do not support metrics consider implementing a fallback 
solution:
{{PAssert}} will write to temp files, which will be placed in the test's 
{{tempLocation}} ({{file}} in local mode, {{gs/hdfs}} in cluster mode). These 
temp files' names will include a UUID generated for each test, this UUID will 
be accessible to {{PAssert}} or the text sink.
Successful assertion count and failed assertion count + exceptions will be 
written to text files which will then be accessible to {{TestPipeline}} to 
assert on.

  was:
This doesn't need to be part of each {{PipelineRunner}} implementation if it 
goes through the {{PipelineResult}} APIs. The assertion can be of the form that 
if the Pipeline is finished, then the number of successful assertions is equal 
to the total number of assertions.

Suggested solution:
For runners which support metrics, use the counters for successful/failed 
assertions and compare them to expected number of assertions.
For runners which do not support metrics consider this solution:
{{PAssert}} will write to temp files, which will be placed in the test's 
{{tempLocation}} ({{file}} in local mode, {{gs/hdfs}} in cluster mode). These 
temp files' names will include a UUID generated for each test, this UUID will 
be accessible to {{PAssert}} or the text sink.
Successful assertion count and failed assertion count + exceptions will be 
written to text files which will then be accessible to {{TestPipeline}} to 
assert on.


> TestPipeline should ensure that all assertions succeeded
> 
>
> Key: BEAM-1763
> URL: https://issues.apache.org/jira/browse/BEAM-1763
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Priority: Blocker
> Fix For: First stable release
>
>
> This doesn't need to be part of each {{PipelineRunner}} implementation if it 
> goes through the {{PipelineResult}} APIs. The assertion can be of the form 
> that if the Pipeline is finished, then the number of successful assertions is 
> equal to the total number of assertions.
> Suggested solution:
> For runners which support metrics, use the counters for successful/failed 
> assertions and compare them to expected number of assertions.
> For runners which do not support metrics consider implementing a fallback 
> solution:
> {{PAssert}} will write to temp files, which will be placed in the test's 
> {{tempLocation}} ({{file}} in local mode, {{gs/hdfs}} in cluster mode). These 
> temp files' names will include a UUID generated for each test, this UUID will 
> be accessible to {{PAssert}} or the text sink.
> Successful assertion count and failed assertion count + exceptions will be 
> written to text files which will then be accessible to {{TestPipeline}} to 
> assert on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2001) Verify PAssert execution in all runners

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-2001:
---

Assignee: Aviem Zur  (was: Kenneth Knowles)

> Verify PAssert execution in all runners
> ---
>
> Key: BEAM-2001
> URL: https://issues.apache.org/jira/browse/BEAM-2001
> Project: Beam
>  Issue Type: Test
>  Components: runner-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Blocker
> Fix For: First stable release
>
>
> Currently, {{PAssert}} assertions may not happen and tests will pass while 
> silently hiding issues.
> For {{ValidatesRunner}} tests to truly validate a runner supports the model 
> we need to verify the {{PAssert}} assertions actually ran.
> See [dev list 
> discussion|https://lists.apache.org/thread.html/9e6b9e6a21d2a657a1dd293b8cc8497c76a8a66fa3a1358733c02101@%3Cdev.beam.apache.org%3E].
> In order to reduce duplication, for runners which support metrics we could 
> verify this in {{TestPipeline}}, removing the need for the runner itself to 
> make this assertion (See https://issues.apache.org/jira/browse/BEAM-1763).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2001) Verify PAssert execution in all runners

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-2001:
---

Assignee: (was: Aviem Zur)

> Verify PAssert execution in all runners
> ---
>
> Key: BEAM-2001
> URL: https://issues.apache.org/jira/browse/BEAM-2001
> Project: Beam
>  Issue Type: Test
>  Components: runner-core
>Reporter: Aviem Zur
>Priority: Blocker
> Fix For: First stable release
>
>
> Currently, {{PAssert}} assertions may not happen and tests will pass while 
> silently hiding issues.
> For {{ValidatesRunner}} tests to truly validate a runner supports the model 
> we need to verify the {{PAssert}} assertions actually ran.
> See [dev list 
> discussion|https://lists.apache.org/thread.html/9e6b9e6a21d2a657a1dd293b8cc8497c76a8a66fa3a1358733c02101@%3Cdev.beam.apache.org%3E].
> In order to reduce duplication, for runners which support metrics we could 
> verify this in {{TestPipeline}}, removing the need for the runner itself to 
> make this assertion (See https://issues.apache.org/jira/browse/BEAM-1763).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2001) Verify PAssert execution in all runners

2017-04-18 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2001:

Fix Version/s: First stable release

> Verify PAssert execution in all runners
> ---
>
> Key: BEAM-2001
> URL: https://issues.apache.org/jira/browse/BEAM-2001
> Project: Beam
>  Issue Type: Test
>  Components: runner-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Blocker
> Fix For: First stable release
>
>
> Currently, {{PAssert}} assertions may not happen and tests will pass while 
> silently hiding issues.
> For {{ValidatesRunner}} tests to truly validate a runner supports the model 
> we need to verify the {{PAssert}} assertions actually ran.
> See [dev list 
> discussion|https://lists.apache.org/thread.html/9e6b9e6a21d2a657a1dd293b8cc8497c76a8a66fa3a1358733c02101@%3Cdev.beam.apache.org%3E].
> In order to reduce duplication, for runners which support metrics we could 
> verify this in {{TestPipeline}}, removing the need for the runner itself to 
> make this assertion (See https://issues.apache.org/jira/browse/BEAM-1763).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

2017-04-19 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974980#comment-15974980
 ] 

Aviem Zur commented on BEAM-2005:
-

I think this should be in `core` not in `extensions`.
Also, I think this should be a blocker for first stable release. WDYT?

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> ---
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Affects Versions: First stable release
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

2017-04-19 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976046#comment-15976046
 ] 

Aviem Zur commented on BEAM-2005:
-

Regarding {{core}} vs {{extensions}}. This can reside in a separate module from 
core, but I think that core should depend on it so users get this functionality 
out of the box.

For example, if a user uses {{TextIO}} and it works for them when passing 
{{"file://path/to/file"}}, changing this to {{"hdfs://path/to/file"}} should 
work without the need to add a new dependency to their project.

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> ---
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1763) TestPipeline should ensure that all assertions succeeded

2017-04-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1763:

Description: 
This doesn't need to be part of each {{PipelineRunner}} implementation if it 
goes through the {{PipelineResult}} APIs. The assertion can be of the form that 
if the Pipeline is finished, then the number of successful assertions is equal 
to the total number of assertions.

Suggested solution:

For runners which support metrics, use the counters for successful/failed 
assertions and compare them to expected number of assertions.
Runners which do not support metrics should either implement metrics or 
override {{PAssert}} in a way that verifies its execution.

  was:
This doesn't need to be part of each {{PipelineRunner}} implementation if it 
goes through the {{PipelineResult}} APIs. The assertion can be of the form that 
if the Pipeline is finished, then the number of successful assertions is equal 
to the total number of assertions.

Suggested solution:

For runners which support metrics, use the counters for successful/failed 
assertions and compare them to expected number of assertions.

For runners which do not support metrics consider implementing a fallback 
solution:
{{PAssert}} will write to temp files, which will be placed in the test's 
{{tempLocation}} ({{file}} in local mode, {{gs/hdfs}} in cluster mode). These 
temp files' names will include a UUID generated for each test, this UUID will 
be accessible to {{PAssert}} or the text sink.
Successful assertion count and failed assertion count + exceptions will be 
written to text files which will then be accessible to {{TestPipeline}} to 
assert on.


> TestPipeline should ensure that all assertions succeeded
> 
>
> Key: BEAM-1763
> URL: https://issues.apache.org/jira/browse/BEAM-1763
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Aviem Zur
>
> This doesn't need to be part of each {{PipelineRunner}} implementation if it 
> goes through the {{PipelineResult}} APIs. The assertion can be of the form 
> that if the Pipeline is finished, then the number of successful assertions is 
> equal to the total number of assertions.
> Suggested solution:
> For runners which support metrics, use the counters for successful/failed 
> assertions and compare them to expected number of assertions.
> Runners which do not support metrics should either implement metrics or 
> override {{PAssert}} in a way that verifies its execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2057) Test metrics are reported to Spark Metrics sink.

2017-04-23 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-2057:
---

 Summary: Test metrics are reported to Spark Metrics sink.
 Key: BEAM-2057
 URL: https://issues.apache.org/jira/browse/BEAM-2057
 Project: Beam
  Issue Type: Test
  Components: runner-spark
Reporter: Aviem Zur
Assignee: Amit Sela


Test that metrics are reported to Spark's metric sink.

Use {{InMemoryMetrics}} and {{InMemoryMetricsSinkRule}} similarly to the 
{{NamedAggregatorsTest}} which tests that aggregators are reported to Spark's 
metrics sink (Aggregators are being removes so this test should be in a 
separate class).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2057) Test metrics are reported to Spark Metrics sink.

2017-04-23 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-2057:
---

Assignee: (was: Amit Sela)

> Test metrics are reported to Spark Metrics sink.
> 
>
> Key: BEAM-2057
> URL: https://issues.apache.org/jira/browse/BEAM-2057
> Project: Beam
>  Issue Type: Test
>  Components: runner-spark
>Reporter: Aviem Zur
>  Labels: newbie, starter
>
> Test that metrics are reported to Spark's metric sink.
> Use {{InMemoryMetrics}} and {{InMemoryMetricsSinkRule}} similarly to the 
> {{NamedAggregatorsTest}} which tests that aggregators are reported to Spark's 
> metrics sink (Aggregators are being removes so this test should be in a 
> separate class).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2057) Test metrics are reported to Spark Metrics sink.

2017-04-23 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2057:

Description: 
Test that metrics are reported to Spark's metric sink.

Use {{InMemoryMetrics}} and {{InMemoryMetricsSinkRule}} similarly to the 
{{NamedAggregatorsTest}} which tests that aggregators are reported to Spark's 
metrics sink (Aggregators are being removes so this test should be in a 
separate class).

For an example on how to create a pipeline with metrics take a look at 
{{MetricsTest}}.

  was:
Test that metrics are reported to Spark's metric sink.

Use {{InMemoryMetrics}} and {{InMemoryMetricsSinkRule}} similarly to the 
{{NamedAggregatorsTest}} which tests that aggregators are reported to Spark's 
metrics sink (Aggregators are being removes so this test should be in a 
separate class).


> Test metrics are reported to Spark Metrics sink.
> 
>
> Key: BEAM-2057
> URL: https://issues.apache.org/jira/browse/BEAM-2057
> Project: Beam
>  Issue Type: Test
>  Components: runner-spark
>Reporter: Aviem Zur
>  Labels: newbie, starter
>
> Test that metrics are reported to Spark's metric sink.
> Use {{InMemoryMetrics}} and {{InMemoryMetricsSinkRule}} similarly to the 
> {{NamedAggregatorsTest}} which tests that aggregators are reported to Spark's 
> metrics sink (Aggregators are being removes so this test should be in a 
> separate class).
> For an example on how to create a pipeline with metrics take a look at 
> {{MetricsTest}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-2027) get error sometimes while running the same code using beam0.6

2017-04-22 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979797#comment-15979797
 ] 

Aviem Zur edited comment on BEAM-2027 at 4/22/17 7:22 AM:
--

{{local class incompatible}} happens when trying to deserialize a Java class of 
an older version than the one in your classpath.
The class in question is 
{{org.apache.beam.runners.spark.coders.CoderHelpers$3}} which is a part of Beam.

In believe in your application it could be the case that the version of Beam in 
the driver's classpath is different than the one in the executor's classpath. 

Spark is serializing an instance of this class in the driver at one version and 
deserializing it in an executor which expects a different version.

A few questions about your setup:

Did you package your application's jar as a fat jar containing Beam jars?
If not: do you distribute Beam jars to your executors in some other way?

Is this a streaming application? (from your description it didn't sound like 
you used streaming).
If so, Spark streaming checkpoints certain serialized instances of classes. 
When changing these in your application and resuming from checkpoint (for 
example, upgrading Beam version) you can experience failures such as this. The 
solution for this is deleting the checkpoint and starting fresh.


was (Author: aviemzur):
{{local class incompatible}} happens when trying to deserialize a Java class of 
an older version than the one in your classpath.
The class in question is 
{{org.apache.beam.runners.spark.coders.CoderHelpers$3}} which is a part of Beam.

In believe in your case the version of Beam in the driver's classpath is 
different than the one in the executor's classpath. 

Spark is serializing an instance of this class in the driver at one version and 
deserializing it in an executor which expects a different version.

A few questions about your setup:

Did you package your application's jar as a fat jar containing Beam jars?
If not: do you distribute Beam jars to your executors in some other way?

Is this a streaming application? (from your description it didn't sound like 
you used streaming).
If so, Spark streaming checkpoints certain serialized instances of classes. 
When changing these in your application and resuming from checkpoint (for 
example, upgrading Beam version) you can experience failures such as this. The 
solution for this is deleting the checkpoint and starting fresh.

> get error sometimes while running the same code using beam0.6
> -
>
> Key: BEAM-2027
> URL: https://issues.apache.org/jira/browse/BEAM-2027
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-spark
> Environment: spark-1.6.2-bin-hadoop2.6, hadoop-2.6.0, source:hdfs 
> sink:hdfs
>Reporter: liyuntian
>Assignee: Aviem Zur
>
> run a yarn job using beam0.6.0, I get file from hdfs and write record to 
> hdfs, I use spark-1.6.2-bin-hadoop2.6,hadoop-2.6.0. I get error sometime 
> below, 
> 17/04/20 21:10:45 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, etl-develop-003): java.io.InvalidClassException: 
> org.apache.beam.runners.spark.coders.CoderHelpers$3; local class 
> incompatible: stream classdesc serialVersionUID = 1334222146820528045, local 
> class serialVersionUID = 5119956493581628999
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> 

[jira] [Comment Edited] (BEAM-2027) get error sometimes while running the same code using beam0.6

2017-04-22 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979797#comment-15979797
 ] 

Aviem Zur edited comment on BEAM-2027 at 4/22/17 7:21 AM:
--

{{local class incompatible}} happens when trying to deserialize a Java class of 
an older version than the one in your classpath.
The class in question is 
{{org.apache.beam.runners.spark.coders.CoderHelpers$3}} which is a part of Beam.

In believe in your case the version of Beam in the driver's classpath is 
different than the one in the executor's classpath. 

Spark is serializing an instance of this class in the driver at one version and 
deserializing it in an executor which expects a different version.

A few questions about your setup:

Did you package your application's jar as a fat jar containing Beam jars?
If not: do you distribute Beam jars to your executors in some other way?

Is this a streaming application? (from your description it didn't sound like 
you used streaming).
If so, Spark streaming checkpoints certain serialized instances of classes. 
When changing these in your application and resuming from checkpoint (for 
example, upgrading Beam version) you can experience failures such as this. The 
solution for this is deleting the checkpoint and starting fresh.


was (Author: aviemzur):
{{local class incompatible}} happens when trying to deserialize a Java class of 
an older version than the one in your classpath.
The class in question is 
{{org.apache.beam.runners.spark.coders.CoderHelpers$3}} which is a part of Beam.

In your case - the version of Beam in the driver's classpath is different than 
the one in the executor's classpath. 

Spark is serializing an instance of this class in the driver at one version and 
deserializing it in an executor which expects a different version.

A few questions about your setup:

Did you package your application's jar as a fat jar containing Beam jars?
If not: do you distribute Beam jars to your executors in some other way?

Is this a streaming application? (from your description it didn't sound like 
you used streaming).
If so, Spark streaming checkpoints certain serialized instances of classes. 
When changing these in your application and resuming from checkpoint (for 
example, upgrading Beam version) you can experience failures such as this. The 
solution for this is deleting the checkpoint and starting fresh.

> get error sometimes while running the same code using beam0.6
> -
>
> Key: BEAM-2027
> URL: https://issues.apache.org/jira/browse/BEAM-2027
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-spark
> Environment: spark-1.6.2-bin-hadoop2.6, hadoop-2.6.0, source:hdfs 
> sink:hdfs
>Reporter: liyuntian
>Assignee: Aviem Zur
>
> run a yarn job using beam0.6.0, I get file from hdfs and write record to 
> hdfs, I use spark-1.6.2-bin-hadoop2.6,hadoop-2.6.0. I get error sometime 
> below, 
> 17/04/20 21:10:45 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, etl-develop-003): java.io.InvalidClassException: 
> org.apache.beam.runners.spark.coders.CoderHelpers$3; local class 
> incompatible: stream classdesc serialVersionUID = 1334222146820528045, local 
> class serialVersionUID = 5119956493581628999
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at 

[jira] [Assigned] (BEAM-1919) Standard IO Metrics

2017-04-23 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1919:
---

Assignee: (was: Aviem Zur)

> Standard IO Metrics
> ---
>
> Key: BEAM-1919
> URL: https://issues.apache.org/jira/browse/BEAM-1919
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Aviem Zur
>
> A foundation for standard IO metrics, which could be used by all IOs.
> A standard IO metric is a namespace + name combination which all IOs which 
> report a metric in the same vein will adhere to.
> Also, supply factories and members for these metrics, accessible to IOs via 
> the SDK of the language they were written in.
> [Proposal document|https://s.apache.org/standard-io-metrics]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


<    1   2   3   4   >