[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:32 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink extends FileBasedSink}} and 
{{JsonSource  extends FileBasedSource}} classes. But these should not 
be used directly by the user. All common JSON file logic regarding how the file 
should be constructed (As [~jkff] mentioned this should be better defined) will 
be in these sink and source, including all file writing/reading related code 
(Inherited from {{FileBasedSink}} and {{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from objects to JSON Strings (since there are several 
ways to implement this).

Should these transforms be in a separate class such as {{JacksonIO}} (Similar 
to {{AvroIO}})?


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink extends FileBasedSink}} and 
{{JsonSource  extends FileBasedSource}} classes. But these should not 
be used directly by the user. All common JSON file logic regarding how the file 
should be constructed (As [~jkff] mentioned this should be better defined) will 
be in these sink and source, including all file writing/reading related code 
(Inherited from {{FileBasedSink}} and {{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from objects to JSON Strings (since there are several 
ways to implement this).

Should these transforms be in a separate class such as {{JacksonIO}}?

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:26 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user. All common JSON file logic 
regarding how the file should be constructed (As [~jkff] mentioned this should 
be better defined) will be in these sink and source, including all file 
writing/reading related code (Inherited from {{FileBasedSink}} and 
{{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from objects to JSON Strings (since there are several 
ways to implement this).

Should these transforms be in a separate class such as {{JacksonIO}}?


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user. All common JSON file logic 
regarding how the file should be constructed (As [~jkff] mentioned this should 
be better defined) will be in these sink and source, including all file 
writing/reading related code (Inherited from {{FileBasedSink}} and 
{{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from object to JSON string (since there are several 
ways to implement this).

Should these transforms be in a separate class such as JacksonIO?

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:24 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user. All common JSON file logic 
regarding how the file should be constructed (As [~jkff] mentioned this should 
be better defined) will be in these sink and source, including all file 
writing/reading related code (Inherited from {{FileBasedSink}} and 
{{FileBasedSource}}).

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from object to JSON string (since there are several 
ways to implement this).

Should these transforms be in a separate class such as JacksonIO?

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user.

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from object to JSON string (since there are several 
ways to implement this).

Should these transforms be in a separate class such as JacksonIO?All common 
JSON file logic regarding how the file should be constructed (As [~jkff] 
mentioned this should be better defined) will be in the abstract sink and 
source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:24 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

We definitely need concrete {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. But 
these should not be used directly by the user.

In order to avoid exposing classes which deal with Strings to the user we need 
concrete {{PTransform}} classes which deal with objects.

The problem is these probably can't exist in a {{JsonIO}} class since it cannot 
have the transformations from object to JSON string (since there are several 
ways to implement this).

Should these transforms be in a separate class such as JacksonIO?All common 
JSON file logic regarding how the file should be constructed (As [~jkff] 
mentioned this should be better defined) will be in the abstract sink and 
source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I agree that {{JsonIO}} itself should not be abstract and should not implement 
any interface (same as other {{XxxIO}} classes that exist today).
It will be the enclosing class of {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. These 
will not be exposed to the user.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:19 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I agree that {{JsonIO}} itself should not be abstract and should not implement 
any interface (same as other {{XxxIO}} classes that exist today).
It will be the enclosing class of {{JsonSink}} and {{JsonSource}} classes which 
extend the existing abstract {{FileBasedSink}} and {{FileBasedSource}}. These 
will not be exposed to the user.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
We'll need {{JsonSink}} and {{JsonSource}} classes which extend the existing 
abstract {{FileBasedSink}} and {{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:18 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
We'll need {{JsonSink}} and {{JsonSource}} classes which extend the existing 
abstract {{FileBasedSink}} and {{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
{{JsonIO}} should be the enclosing class of abstract {{JsonSink}} and 
{{JsonSource}} classes which extend the existing abstract {{FileBasedSink}} and 
{{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926687#comment-15926687
 ] 

Aviem Zur edited comment on BEAM-1581 at 3/15/17 6:16 PM:
--

I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
{{JsonIO}} should be the enclosing class of abstract {{JsonSink}} and 
{{JsonSource}} classes which extend the existing abstract {{FileBasedSink}} and 
{{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. 
The {{PTransform}} that the user will use will be concrete, found in 
{{JacksonIO}}


was (Author: aviemzur):
I think we should avoid exposing a contract to the user which promises writing 
JSONs but accepts strings.
This is a loose contract which will leave JSON validity up to the user. If the 
user does not create valid JSON Strings errors can occur.
Errors which might be detected very late in the process, possibly only upon an 
attempt to consume the data in another process (which may belong to a different 
user as JSON is often used for integration).

I think {{JsonIO}} itself should not be abstract and should not implement any 
interface (same as other {{XxxIO}} classes that exist today).
{{JsonIO}} should be the enclosing class of abstract {{JsonSink}} and 
{{JsonSource}} classes which extend the existing abstract {{FileBasedSink}} and 
{{FileBasedSource}}.

These will be implemented by concrete classes such as {{JacksonSink}} and 
{{JacksonSource}}.
All common JSON file logic regarding how the file should be constructed (As 
[~jkff] mentioned this should be better defined) will be in the abstract sink 
and source, including all file writing/reading related code (Inherited from 
{{FileBasedSink}} and {{FileBasedSource}}).

So none of these are actually an abstract {{PTransform}}. The {{PTransform}} 
the user will end up using here are {{Write}} and {{Read}} from the {{io}} 
package.

Example:
{code}
Write.to(JacksonSink.of(...)) // for writing
Read.from(JacksonSource.of(...)) // for reading
{code}

> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)