[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184339&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184339
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 11/Jan/19 20:33
Start Date: 11/Jan/19 20:33
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7373: 
[BEAM-5918] Fix casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184339)
Time Spent: 9h 40m  (was: 9.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184183
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 11/Jan/19 12:31
Start Date: 11/Jan/19 12:31
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7373: [BEAM-5918] Fix 
casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373#issuecomment-453502723
 
 
   @kennknowles thanks for the review! I fixed the code and rebased
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184183)
Time Spent: 9.5h  (was: 9h 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184147
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 11/Jan/19 10:18
Start Date: 11/Jan/19 10:18
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #7373: [BEAM-5918] 
Fix casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373#discussion_r247065259
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastTest.java
 ##
 @@ -36,137 +32,252 @@
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
+import org.junit.rules.ExpectedException;
 
 /** Tests for {@link Cast}. */
 public class CastTest {
 
   @Rule public final transient TestPipeline pipeline = TestPipeline.create();
+  @Rule public transient ExpectedException expectedException = 
ExpectedException.none();
 
   @Test
   @Category(NeedsRunner.class)
-  public void testProjection() throws Exception {
-Schema outputSchema = 
pipeline.getSchemaRegistry().getSchema(Projection2.class);
-PCollection pojos =
+  public void testProjection() {
+Schema inputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT16),
+Schema.Field.of("f1", Schema.FieldType.INT32),
+Schema.Field.of("f2", Schema.FieldType.STRING));
+
+// remove f0 and reorder f1 and f2
+Schema outputSchema =
+Schema.of(
+Schema.Field.of("f2", Schema.FieldType.STRING),
+Schema.Field.of("f1", Schema.FieldType.INT32));
+
+Row input = Row.withSchema(inputSchema).addValues((short) 1, 2, 
"3").build();
+Row expected = Row.withSchema(outputSchema).addValues("3", 2).build();
+
+PCollection output =
 pipeline
-.apply(Create.of(new Projection1()))
-.apply(Cast.widening(outputSchema))
-.apply(Convert.to(Projection2.class));
+.apply(Create.of(input).withRowSchema(inputSchema))
+.apply(Cast.widening(outputSchema));
+
+PAssert.that(output).containsInAnyOrder(expected);
 
-PAssert.that(pojos).containsInAnyOrder(new Projection2());
 pipeline.run();
   }
 
   @Test
   @Category(NeedsRunner.class)
-  public void testTypeWiden() throws Exception {
-Schema outputSchema = 
pipeline.getSchemaRegistry().getSchema(TypeWiden2.class);
+  public void testTypeWiden() {
+Schema inputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT16),
+Schema.Field.of("f1", Schema.FieldType.INT32));
+
+Schema outputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT32),
+Schema.Field.of("f1", Schema.FieldType.INT64));
 
-PCollection pojos =
+Row input = Row.withSchema(inputSchema).addValues((short) 1, 2).build();
+Row expected = Row.withSchema(outputSchema).addValues(1, 2L).build();
+
+PCollection output =
 pipeline
-.apply(Create.of(new TypeWiden1()))
-.apply(Cast.widening(outputSchema))
-.apply(Convert.to(TypeWiden2.class));
+.apply(Create.of(input).withRowSchema(inputSchema))
+.apply(Cast.widening(outputSchema));
+
+PAssert.that(output).containsInAnyOrder(expected);
 
-PAssert.that(pojos).containsInAnyOrder(new TypeWiden2());
 pipeline.run();
   }
 
   @Test
-  @Category(NeedsRunner.class)
-  public void testTypeNarrow() throws Exception {
-// narrowing is the opposite of widening
-Schema outputSchema = 
pipeline.getSchemaRegistry().getSchema(TypeWiden1.class);
+  public void testTypeWidenFail() {
+Schema inputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT16),
+Schema.Field.of("f1", Schema.FieldType.INT64));
 
-PCollection pojos =
-pipeline
-.apply(Create.of(new TypeWiden2()))
-.apply(Cast.narrowing(outputSchema))
-.apply(Convert.to(TypeWiden1.class));
+Schema outputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT32),
+Schema.Field.of("f1", Schema.FieldType.INT32));
 
-PAssert.that(pojos).containsInAnyOrder(new TypeWiden1());
-pipeline.run();
-  }
+expectedException.expect(IllegalArgumentException.class);
+expectedException.expectMessage(containsString("f1: Can't cast 'INT64' to 
'INT32'"));
 
-  @Test(expected = IllegalArgumentException.class)
-  @Category(NeedsRunner.class)
-  public void testTypeNarrowFail() throws Exception {
-// narrowing is the opposite of widening
-Schema inputSchema = 
pipeline.getSchemaRegistry().getSchema(TypeWiden2.class);
-Schema outputSchema = 
pipeline.getSchemaRegistry().getSchema(TypeWiden1.class);
-
-C

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184094
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 11/Jan/19 04:42
Start Date: 11/Jan/19 04:42
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7373: [BEAM-5918] Fix 
casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373#issuecomment-453376610
 
 
   LGTM if my comment is wrong. Nice code & tests, again. There's some conflict 
to be resolved.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184094)
Time Spent: 9h 10m  (was: 9h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184093
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 11/Jan/19 04:41
Start Date: 11/Jan/19 04:41
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7373: 
[BEAM-5918] Fix casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373#discussion_r247002100
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastTest.java
 ##
 @@ -36,137 +32,252 @@
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
+import org.junit.rules.ExpectedException;
 
 /** Tests for {@link Cast}. */
 public class CastTest {
 
   @Rule public final transient TestPipeline pipeline = TestPipeline.create();
+  @Rule public transient ExpectedException expectedException = 
ExpectedException.none();
 
   @Test
   @Category(NeedsRunner.class)
-  public void testProjection() throws Exception {
-Schema outputSchema = 
pipeline.getSchemaRegistry().getSchema(Projection2.class);
-PCollection pojos =
+  public void testProjection() {
+Schema inputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT16),
+Schema.Field.of("f1", Schema.FieldType.INT32),
+Schema.Field.of("f2", Schema.FieldType.STRING));
+
+// remove f0 and reorder f1 and f2
+Schema outputSchema =
+Schema.of(
+Schema.Field.of("f2", Schema.FieldType.STRING),
+Schema.Field.of("f1", Schema.FieldType.INT32));
+
+Row input = Row.withSchema(inputSchema).addValues((short) 1, 2, 
"3").build();
+Row expected = Row.withSchema(outputSchema).addValues("3", 2).build();
+
+PCollection output =
 pipeline
-.apply(Create.of(new Projection1()))
-.apply(Cast.widening(outputSchema))
-.apply(Convert.to(Projection2.class));
+.apply(Create.of(input).withRowSchema(inputSchema))
+.apply(Cast.widening(outputSchema));
+
+PAssert.that(output).containsInAnyOrder(expected);
 
-PAssert.that(pojos).containsInAnyOrder(new Projection2());
 pipeline.run();
   }
 
   @Test
   @Category(NeedsRunner.class)
-  public void testTypeWiden() throws Exception {
-Schema outputSchema = 
pipeline.getSchemaRegistry().getSchema(TypeWiden2.class);
+  public void testTypeWiden() {
+Schema inputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT16),
+Schema.Field.of("f1", Schema.FieldType.INT32));
+
+Schema outputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT32),
+Schema.Field.of("f1", Schema.FieldType.INT64));
 
-PCollection pojos =
+Row input = Row.withSchema(inputSchema).addValues((short) 1, 2).build();
+Row expected = Row.withSchema(outputSchema).addValues(1, 2L).build();
+
+PCollection output =
 pipeline
-.apply(Create.of(new TypeWiden1()))
-.apply(Cast.widening(outputSchema))
-.apply(Convert.to(TypeWiden2.class));
+.apply(Create.of(input).withRowSchema(inputSchema))
+.apply(Cast.widening(outputSchema));
+
+PAssert.that(output).containsInAnyOrder(expected);
 
-PAssert.that(pojos).containsInAnyOrder(new TypeWiden2());
 pipeline.run();
   }
 
   @Test
-  @Category(NeedsRunner.class)
-  public void testTypeNarrow() throws Exception {
-// narrowing is the opposite of widening
-Schema outputSchema = 
pipeline.getSchemaRegistry().getSchema(TypeWiden1.class);
+  public void testTypeWidenFail() {
+Schema inputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT16),
+Schema.Field.of("f1", Schema.FieldType.INT64));
 
-PCollection pojos =
-pipeline
-.apply(Create.of(new TypeWiden2()))
-.apply(Cast.narrowing(outputSchema))
-.apply(Convert.to(TypeWiden1.class));
+Schema outputSchema =
+Schema.of(
+Schema.Field.of("f0", Schema.FieldType.INT32),
+Schema.Field.of("f1", Schema.FieldType.INT32));
 
-PAssert.that(pojos).containsInAnyOrder(new TypeWiden1());
-pipeline.run();
-  }
+expectedException.expect(IllegalArgumentException.class);
+expectedException.expectMessage(containsString("f1: Can't cast 'INT64' to 
'INT32'"));
 
-  @Test(expected = IllegalArgumentException.class)
-  @Category(NeedsRunner.class)
-  public void testTypeNarrowFail() throws Exception {
-// narrowing is the opposite of widening
-Schema inputSchema = 
pipeline.getSchemaRegistry().getSchema(TypeWiden2.class);
-Schema outputSchema = 
pipeline.getSchemaRegistry().getSchema(TypeWiden1.class);
-
-  

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=183189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-183189
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 09/Jan/19 17:33
Start Date: 09/Jan/19 17:33
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7373: [BEAM-5918] Fix 
casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373#issuecomment-452782670
 
 
   @kennknowles did you have a chance to take a look?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 183189)
Time Spent: 8h 50m  (was: 8h 40m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=180468&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-180468
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 03/Jan/19 00:31
Start Date: 03/Jan/19 00:31
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7373: [BEAM-5918] Fix 
casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373#issuecomment-451027580
 
 
   Ah, sorry. Processing large post-holiday inbox too quickly and shallowly. 
I'll take a look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 180468)
Time Spent: 8h 40m  (was: 8.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=180465&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-180465
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 03/Jan/19 00:06
Start Date: 03/Jan/19 00:06
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7373: [BEAM-5918] Fix 
casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373#issuecomment-451023769
 
 
   @kennknowles no, it isn't connected to Beam SQL 
https://github.com/apache/beam/pull/6417, it's transform from core
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 180465)
Time Spent: 8.5h  (was: 8h 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=180381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-180381
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 02/Jan/19 19:40
Start Date: 02/Jan/19 19:40
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7373: [BEAM-5918] Fix 
casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373#issuecomment-450963267
 
 
   Is this preempted by #6417 ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 180381)
Time Spent: 8h 20m  (was: 8h 10m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2019-01-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=180227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-180227
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 02/Jan/19 16:23
Start Date: 02/Jan/19 16:23
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7373: [BEAM-5918] Fix 
casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373#issuecomment-450909373
 
 
   R: @kennknowles 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 180227)
Time Spent: 8h 10m  (was: 8h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179516
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 16:12
Start Date: 28/Dec/18 16:12
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #7372: [BEAM-5918] Fix 
CastTest
URL: https://github.com/apache/beam/pull/7372#issuecomment-450384079
 
 
   lgtm
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179516)
Time Spent: 8h  (was: 7h 50m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179515
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 16:12
Start Date: 28/Dec/18 16:12
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7372: [BEAM-5918] 
Fix CastTest
URL: https://github.com/apache/beam/pull/7372
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179515)
Time Spent: 7h 50m  (was: 7h 40m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179504&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179504
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 15:37
Start Date: 28/Dec/18 15:37
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #7373: [BEAM-5918] 
Fix casting of non-numeric types
URL: https://github.com/apache/beam/pull/7373
 
 
   Continuation of https://github.com/apache/beam/pull/7372.
   
   Refactoring of tests, and fixing casting of non-numeric types that wasn't 
checked before.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179504)
Time Spent: 7h 40m  (was: 7.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179500&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179500
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 15:29
Start Date: 28/Dec/18 15:29
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7372: [BEAM-5918] Fix 
CastTest
URL: https://github.com/apache/beam/pull/7372#issuecomment-450377111
 
 
   @reuvenlax yes, it's hard to notice because it's using POJOs that are 
declared few screens later. I did one more PR (+150 -350) to refactor and use 
explicit rows and schemas, so each test will fit one screen.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179500)
Time Spent: 7.5h  (was: 7h 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179496&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179496
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 15:27
Start Date: 28/Dec/18 15:27
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #7372: [BEAM-5918] 
Fix CastTest
URL: https://github.com/apache/beam/pull/7372#discussion_r244344695
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastTest.java
 ##
 @@ -112,15 +118,6 @@ public void testWeakedNullable() throws Exception {
 pipeline.run();
   }
 
-  @Test(expected = IllegalArgumentException.class)
-  @Category(NeedsRunner.class)
-  public void testWeakedNullableFail() throws Exception {
-Schema inputSchema = 
pipeline.getSchemaRegistry().getSchema(Nullable1.class);
-Schema outputSchema = 
pipeline.getSchemaRegistry().getSchema(Nullable2.class);
-
-Cast.widening(outputSchema).verifyCompatibility(inputSchema);
 
 Review comment:
   It's valid to cast from
   ```
 public static class Nullable1 {
   public Integer field1 = 42;
   public @Nullable Long field2 = null;
   ```
   to
   ```
 public static class Nullable2 {
   public @Nullable Integer field1 = 42;
   public @Nullable Long field2 = null;
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179496)
Time Spent: 7h 20m  (was: 7h 10m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179492
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 15:16
Start Date: 28/Dec/18 15:16
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #7372: [BEAM-5918] Fix 
CastTest
URL: https://github.com/apache/beam/pull/7372#issuecomment-450375048
 
 
   Question: was the test you removed simply incorrect?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179492)
Time Spent: 7h 10m  (was: 7h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179489&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179489
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 15:07
Start Date: 28/Dec/18 15:07
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7372: [BEAM-5918] Fix 
CastTest
URL: https://github.com/apache/beam/pull/7372#issuecomment-450373464
 
 
   R: @kennknowles @reuvenlax 
   
   Please take a look, fixes one of failing tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179489)
Time Spent: 7h  (was: 6h 50m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179473
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 14:44
Start Date: 28/Dec/18 14:44
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #7372: [BEAM-5918] 
Fix CastTest
URL: https://github.com/apache/beam/pull/7372
 
 
   Right now `CastTest` is failing, the build is green because runner tests 
aren't part of Java Run PreCommit. If we want to enable runner tests, it would 
be better to fix failing tests first.
   
   I have PR with a bigger refactor, however, I want to start small, and fix 
failing tests first.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179473)
Time Spent: 6h 50m  (was: 6h 40m)

> Add Cast transform for Rows
> ---
>
>   

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179425&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179425
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 12:02
Start Date: 28/Dec/18 12:02
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #7363: [BEAM-5918] 
[WIP] Fix CastTest
URL: https://github.com/apache/beam/pull/7363
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179425)
Time Spent: 6h 40m  (was: 6.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179398
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 09:44
Start Date: 28/Dec/18 09:44
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #7363: [BEAM-5918] 
[WIP] Fix CastTest
URL: https://github.com/apache/beam/pull/7363
 
 
   Split into needsRunner and unit tests, otherwise, they don't run.
   
   **Please** add a meaningful description for your change here
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179398)
Time Spent: 6.5h  (was: 6h 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
> 

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179396
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 09:44
Start Date: 28/Dec/18 09:44
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7363: [BEAM-5918] [WIP] 
Fix CastTest
URL: https://github.com/apache/beam/pull/7363#issuecomment-450329135
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179396)
Time Spent: 6h 10m  (was: 6h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179397&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179397
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 28/Dec/18 09:44
Start Date: 28/Dec/18 09:44
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #7363: [BEAM-5918] 
[WIP] Fix CastTest
URL: https://github.com/apache/beam/pull/7363
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179397)
Time Spent: 6h 20m  (was: 6h 10m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179218
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 27/Dec/18 20:20
Start Date: 27/Dec/18 20:20
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7363: [BEAM-5918] [WIP] 
Fix CastTest
URL: https://github.com/apache/beam/pull/7363#issuecomment-450224241
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179218)
Time Spent: 6h  (was: 5h 50m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179205
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 27/Dec/18 19:42
Start Date: 27/Dec/18 19:42
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7363: [BEAM-5918] [WIP] 
Fix CastTest
URL: https://github.com/apache/beam/pull/7363#issuecomment-450217716
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179205)
Time Spent: 5h 40m  (was: 5.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179208
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 27/Dec/18 19:49
Start Date: 27/Dec/18 19:49
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #7363: [BEAM-5918] [WIP] 
Fix CastTest
URL: https://github.com/apache/beam/pull/7363#issuecomment-450217716
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179208)
Time Spent: 5h 50m  (was: 5h 40m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-12-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179176
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 27/Dec/18 18:36
Start Date: 27/Dec/18 18:36
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #7363: [BEAM-5918] 
[WIP] Fix CastTest
URL: https://github.com/apache/beam/pull/7363
 
 
   Split into needsRunner and unit tests, otherwise, they don't run.
   
   **Please** add a meaningful description for your change here
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 179176)
Time Spent: 5.5h  (was: 5h 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
> 

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162738
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 20:12
Start Date: 05/Nov/18 20:12
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #6888: [BEAM-5918] Add 
Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-436018476
 
 
   Ah, sorry I neglectected the squashing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 162738)
Time Spent: 5h 20m  (was: 5h 10m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162736
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 20:11
Start Date: 05/Nov/18 20:11
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #6888: [BEAM-5918] Add 
Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
index 86a0f4653d5..1587a6bbee7 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
@@ -292,7 +292,7 @@ public int hashCode() {
 INT16, // two-byte signed integer.
 INT32, // four-byte signed integer.
 INT64, // eight-byte signed integer.
-DECIMAL, // Decimal integer
+DECIMAL, // Arbitrary-precision decimal number
 FLOAT,
 DOUBLE,
 STRING, // String.
@@ -338,6 +338,47 @@ public boolean isMapType() {
 public boolean isCompositeType() {
   return COMPOSITE_TYPES.contains(this);
 }
+
+public boolean isSubtypeOf(TypeName other) {
+  return other.isSupertypeOf(this);
+}
+
+public boolean isSupertypeOf(TypeName other) {
+  if (this == other) {
+return true;
+  }
+
+  // defined only for numeric types
+  if (!isNumericType() || !other.isNumericType()) {
+return false;
+  }
+
+  switch (this) {
+case BYTE:
+  return false;
+
+case INT16:
+  return other == BYTE;
+
+case INT32:
+  return other == BYTE || other == INT16;
+
+case INT64:
+  return other == BYTE || other == INT16 || other == INT32;
+
+case FLOAT:
+  return false;
+
+case DOUBLE:
+  return other == FLOAT;
+
+case DECIMAL:
+  return other == FLOAT || other == DOUBLE;
+
+default:
+  throw new AssertionError("Unexpected numeric type: " + this);
+  }
+}
   }
 
   /**
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
new file mode 100644
index 000..3048806edf0
--- /dev/null
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
@@ -0,0 +1,440 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.schemas.utils.SchemaZipFold;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema()

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162737
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 20:11
Start Date: 05/Nov/18 20:11
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #6888: [BEAM-5918] Add 
Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-436018354
 
 
   Merged, but if you had other changes in progress, just open another PR from 
the branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 162737)
Time Spent: 5h 10m  (was: 5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162705
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 19:07
Start Date: 05/Nov/18 19:07
Worklog Time Spent: 10m 
  Work Description: kanterov commented on a change in pull request #6888: 
[BEAM-5918] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r230873757
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,440 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.schemas.utils.SchemaZipFold;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Validator validator();
+
+  public static  Cast of(Schema outputSchema, Validator validator) {
+return new AutoValue_Cast<>(outputSchema, validator);
+  }
+
+  public static  Cast widening(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Widening.of());
+  }
+
+  public static  Cast narrowing(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Narrowing.of());
+  }
+
+  /** Describes compatibility errors during casting. */
+  @AutoValue
+  public abstract static class CompatibilityError implements Serializable {
+
+public abstract List path();
+
+public abstract String message();
+
+public static CompatibilityError create(List path, String message) 
{
+  return new AutoValue_Cast_CompatibilityError(path, message);
+}
+  }
+
+  /** Interface for statically validating casts. */
+  public interface Validator extends Serializable {
+List apply(Schema input, Schema output);
+  }
+
+  /**
+   * Widening changes to type that can represent any possible value of the 
original type.
+   *
+   * Standard widening conversions:
+   *
+   * 
+   *   BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT32 to INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT64 to FLOAT, DOUBLE, DECIMAL
+   *   FLOAT to DOUBLE, DECIMAL
+   *   DOUBLE to DECIMAL
+   * 
+   *
+   * Row widening:
+   *
+   * 
+   *   wider schema to schema with a subset of fields
+   *   non-nullable fields to nullable fields
+   * 
+   *
+   * Widening doesn't lose information about the overall magnitude in 
following cases:
+   *
+   * 
+   *   integral type to another integral type
+   *   BYTE or INT16 to FLOAT, DOUBLE or DECIMAL
+   *   INT32 to DOUBLE
+   * 
+   *
+   * Other conversions to may cause loss of precision.
+   */
+  public static class Widening implements Validator {
+private final Fold fold = new Fold();
+
+public static Widening of() {
+  return new Widening();
+}
+
+@Override
+public String toString() {
+  return "Cast.Widening";
+}
+
+@Override
+public List apply(final Schema input, f

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162697&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162697
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 18:52
Start Date: 05/Nov/18 18:52
Worklog Time Spent: 10m 
  Work Description: kanterov commented on a change in pull request #6888: 
[BEAM-5918] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r230868788
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SchemaZipFold.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.utils;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.collect.ImmutableList;
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+
+/**
+ * Visitor that zips schemas, and accepts pairs of fields and their types.
+ *
+ * Values returned by `accept` are accumulated.
+ */
+public abstract class SchemaZipFold implements Serializable {
 
 Review comment:
   It isn't optimized to have few allocations or be fast, because it's called a 
couple of times during graph construction, and, in my understanding, 
performance isn't a concern here.
   
   The reason why I extracted zip and fold is that otherwise, it's hard to see 
the actual narrowing/widening logic. I agree that it is a bit out of the rest 
of codebase. One alternative could be creating something like `class 
ZippedSchemas`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 162697)
Time Spent: 4h 40m  (was: 4.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162655&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162655
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 18:24
Start Date: 05/Nov/18 18:24
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #6888: [BEAM-5918] Add 
Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-435980683
 
 
   I will wait a short while if you want to make any last changes like touching 
up the javadoc, then I will go ahead and merge and we can do it in follow-up 
smaller PRs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 162655)
Time Spent: 4.5h  (was: 4h 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162653&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162653
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 18:23
Start Date: 05/Nov/18 18:23
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r230854619
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
 ##
 @@ -338,6 +338,47 @@ public boolean isMapType() {
 public boolean isCompositeType() {
   return COMPOSITE_TYPES.contains(this);
 }
+
+public boolean isSubtypeOf(TypeName other) {
+  return other.isSupertypeOf(this);
+}
+
+public boolean isSupertypeOf(TypeName other) {
 
 Review comment:
   In #6861 nullability is added for array elements and map values. It isn't 
expressed in the most natural "type system" way, but we should move towards 
treating a nullable `T` as `OPTIONAL` with automatic coercion to `T` rather 
than treating it as just a `T` with nullability as a side condition. It 
requires a fairly significant refactor to do so. Just something to keep in 
mind. It would affect this sub/supertype check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 162653)
Time Spent: 4h 20m  (was: 4h 10m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162651
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 18:23
Start Date: 05/Nov/18 18:23
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r230858306
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SchemaZipFold.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.utils;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.collect.ImmutableList;
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+
+/**
+ * Visitor that zips schemas, and accepts pairs of fields and their types.
+ *
+ * Values returned by `accept` are accumulated.
+ */
+public abstract class SchemaZipFold implements Serializable {
 
 Review comment:
   I understand this class, but maybe others won't love it vs nested switch 
statements / static recursive functions. Out of curiosity does it not add some 
allocation cost?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 162651)
Time Spent: 4h 10m  (was: 4h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162650
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 18:23
Start Date: 05/Nov/18 18:23
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r230857351
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,440 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.schemas.utils.SchemaZipFold;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Validator validator();
+
+  public static  Cast of(Schema outputSchema, Validator validator) {
+return new AutoValue_Cast<>(outputSchema, validator);
+  }
+
+  public static  Cast widening(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Widening.of());
+  }
+
+  public static  Cast narrowing(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Narrowing.of());
+  }
+
+  /** Describes compatibility errors during casting. */
+  @AutoValue
+  public abstract static class CompatibilityError implements Serializable {
+
+public abstract List path();
+
+public abstract String message();
+
+public static CompatibilityError create(List path, String message) 
{
+  return new AutoValue_Cast_CompatibilityError(path, message);
+}
+  }
+
+  /** Interface for statically validating casts. */
+  public interface Validator extends Serializable {
+List apply(Schema input, Schema output);
+  }
+
+  /**
+   * Widening changes to type that can represent any possible value of the 
original type.
+   *
+   * Standard widening conversions:
+   *
+   * 
+   *   BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT32 to INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT64 to FLOAT, DOUBLE, DECIMAL
+   *   FLOAT to DOUBLE, DECIMAL
+   *   DOUBLE to DECIMAL
+   * 
+   *
+   * Row widening:
+   *
+   * 
+   *   wider schema to schema with a subset of fields
+   *   non-nullable fields to nullable fields
+   * 
+   *
+   * Widening doesn't lose information about the overall magnitude in 
following cases:
+   *
+   * 
+   *   integral type to another integral type
+   *   BYTE or INT16 to FLOAT, DOUBLE or DECIMAL
+   *   INT32 to DOUBLE
 
 Review comment:
   What about INT64? And why only some can go to DECIMAL? And adding 
nullability is widening too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact I

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162649
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 18:23
Start Date: 05/Nov/18 18:23
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r230857163
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,440 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.schemas.utils.SchemaZipFold;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Validator validator();
+
+  public static  Cast of(Schema outputSchema, Validator validator) {
+return new AutoValue_Cast<>(outputSchema, validator);
+  }
+
+  public static  Cast widening(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Widening.of());
+  }
+
+  public static  Cast narrowing(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Narrowing.of());
+  }
+
+  /** Describes compatibility errors during casting. */
+  @AutoValue
+  public abstract static class CompatibilityError implements Serializable {
+
+public abstract List path();
+
+public abstract String message();
+
+public static CompatibilityError create(List path, String message) 
{
+  return new AutoValue_Cast_CompatibilityError(path, message);
+}
+  }
+
+  /** Interface for statically validating casts. */
+  public interface Validator extends Serializable {
+List apply(Schema input, Schema output);
+  }
+
+  /**
+   * Widening changes to type that can represent any possible value of the 
original type.
+   *
+   * Standard widening conversions:
+   *
+   * 
+   *   BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT32 to INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT64 to FLOAT, DOUBLE, DECIMAL
+   *   FLOAT to DOUBLE, DECIMAL
+   *   DOUBLE to DECIMAL
+   * 
+   *
+   * Row widening:
+   *
+   * 
+   *   wider schema to schema with a subset of fields
+   *   non-nullable fields to nullable fields
+   * 
+   *
+   * Widening doesn't lose information about the overall magnitude in 
following cases:
+   *
+   * 
+   *   integral type to another integral type
+   *   BYTE or INT16 to FLOAT, DOUBLE or DECIMAL
+   *   INT32 to DOUBLE
+   * 
+   *
+   * Other conversions to may cause loss of precision.
+   */
+  public static class Widening implements Validator {
+private final Fold fold = new Fold();
+
+public static Widening of() {
+  return new Widening();
+}
+
+@Override
+public String toString() {
+  return "Cast.Widening";
+}
+
+@Override
+public List apply(final Schema input

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162654&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162654
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 18:23
Start Date: 05/Nov/18 18:23
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r230857029
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,440 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.schemas.utils.SchemaZipFold;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Validator validator();
+
+  public static  Cast of(Schema outputSchema, Validator validator) {
+return new AutoValue_Cast<>(outputSchema, validator);
+  }
+
+  public static  Cast widening(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Widening.of());
+  }
+
+  public static  Cast narrowing(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Narrowing.of());
+  }
+
+  /** Describes compatibility errors during casting. */
+  @AutoValue
+  public abstract static class CompatibilityError implements Serializable {
+
+public abstract List path();
+
+public abstract String message();
+
+public static CompatibilityError create(List path, String message) 
{
+  return new AutoValue_Cast_CompatibilityError(path, message);
+}
+  }
+
+  /** Interface for statically validating casts. */
+  public interface Validator extends Serializable {
+List apply(Schema input, Schema output);
+  }
+
+  /**
+   * Widening changes to type that can represent any possible value of the 
original type.
+   *
+   * Standard widening conversions:
+   *
+   * 
+   *   BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT32 to INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT64 to FLOAT, DOUBLE, DECIMAL
+   *   FLOAT to DOUBLE, DECIMAL
+   *   DOUBLE to DECIMAL
+   * 
+   *
+   * Row widening:
+   *
+   * 
+   *   wider schema to schema with a subset of fields
+   *   non-nullable fields to nullable fields
+   * 
+   *
+   * Widening doesn't lose information about the overall magnitude in 
following cases:
+   *
+   * 
+   *   integral type to another integral type
+   *   BYTE or INT16 to FLOAT, DOUBLE or DECIMAL
+   *   INT32 to DOUBLE
+   * 
+   *
+   * Other conversions to may cause loss of precision.
+   */
+  public static class Widening implements Validator {
+private final Fold fold = new Fold();
+
+public static Widening of() {
+  return new Widening();
+}
+
+@Override
+public String toString() {
+  return "Cast.Widening";
+}
+
+@Override
+public List apply(final Schema input

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162652
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 18:23
Start Date: 05/Nov/18 18:23
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r230857941
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,440 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.schemas.utils.SchemaZipFold;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Validator validator();
+
+  public static  Cast of(Schema outputSchema, Validator validator) {
+return new AutoValue_Cast<>(outputSchema, validator);
+  }
+
+  public static  Cast widening(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Widening.of());
+  }
+
+  public static  Cast narrowing(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Narrowing.of());
+  }
+
+  /** Describes compatibility errors during casting. */
+  @AutoValue
+  public abstract static class CompatibilityError implements Serializable {
+
+public abstract List path();
+
+public abstract String message();
+
+public static CompatibilityError create(List path, String message) 
{
+  return new AutoValue_Cast_CompatibilityError(path, message);
+}
+  }
+
+  /** Interface for statically validating casts. */
+  public interface Validator extends Serializable {
+List apply(Schema input, Schema output);
+  }
+
+  /**
+   * Widening changes to type that can represent any possible value of the 
original type.
+   *
+   * Standard widening conversions:
+   *
+   * 
+   *   BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT32 to INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT64 to FLOAT, DOUBLE, DECIMAL
+   *   FLOAT to DOUBLE, DECIMAL
+   *   DOUBLE to DECIMAL
+   * 
+   *
+   * Row widening:
+   *
+   * 
+   *   wider schema to schema with a subset of fields
+   *   non-nullable fields to nullable fields
+   * 
+   *
+   * Widening doesn't lose information about the overall magnitude in 
following cases:
+   *
+   * 
+   *   integral type to another integral type
+   *   BYTE or INT16 to FLOAT, DOUBLE or DECIMAL
+   *   INT32 to DOUBLE
+   * 
+   *
+   * Other conversions to may cause loss of precision.
+   */
+  public static class Widening implements Validator {
+private final Fold fold = new Fold();
+
+public static Widening of() {
+  return new Widening();
+}
+
+@Override
+public String toString() {
+  return "Cast.Widening";
+}
+
+@Override
+public List apply(final Schema input

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162638
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 18:12
Start Date: 05/Nov/18 18:12
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #6888: [BEAM-5918] Add 
Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-435976579
 
 
   Sorry for the delay - taking another look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 162638)
Time Spent: 3h 50m  (was: 3h 40m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-11-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162615
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 05/Nov/18 17:46
Start Date: 05/Nov/18 17:46
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #6888: [BEAM-5918] Add Cast 
transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-435967943
 
 
   @kennknowles Gentle ping, or, perhaps, there is somebody else who can help 
with the review? @akedin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 162615)
Time Spent: 3h 40m  (was: 3.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=161320&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-161320
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 31/Oct/18 21:21
Start Date: 31/Oct/18 21:21
Worklog Time Spent: 10m 
  Work Description: kanterov commented on a change in pull request #6888: 
[BEAM-5918] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229872246
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,440 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.schemas.utils.SchemaZipFold;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Validator validator();
+
+  public static  Cast of(Schema outputSchema, Validator validator) {
+return new AutoValue_Cast<>(outputSchema, validator);
+  }
+
+  public static  Cast widening(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Widening.of());
+  }
+
+  public static  Cast narrowing(Schema outputSchema) {
+return new AutoValue_Cast<>(outputSchema, Narrowing.of());
+  }
+
+  /** Describes compatibility errors during casting. */
+  @AutoValue
+  public abstract static class CompatibilityError implements Serializable {
+
+public abstract List path();
+
+public abstract String message();
+
+public static CompatibilityError create(List path, String message) 
{
+  return new AutoValue_Cast_CompatibilityError(path, message);
+}
+  }
+
+  /** Interface for statically validating casts. */
+  public interface Validator extends Serializable {
+List apply(Schema input, Schema output);
+  }
+
+  /**
+   * Widening changes to type that can represent any possible value of the 
original type.
+   *
+   * Standard widening conversions:
+   *
+   * 
+   *   BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT32 to INT64, FLOAT, DOUBLE, DECIMAL
+   *   INT64 to FLOAT, DOUBLE, DECIMAL
+   *   FLOAT to DOUBLE, DECIMAL
+   *   DOUBLE to DECIMAL
+   * 
+   *
+   * Row widening:
+   *
+   * 
+   *   wider schema to schema with a subset of fields
 
 Review comment:
   Doesn't match with the definition of widening, probably should be only in 
narrowing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 161320)
Time Spent: 3.5h  (was: 3h 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.o

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=161296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-161296
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 31/Oct/18 20:17
Start Date: 31/Oct/18 20:17
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #6888: [BEAM-5918] Add Cast 
transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434832598
 
 
   @kennknowles Thanks for the feedback. I've simplified implementation a lot. 
Please take a look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 161296)
Time Spent: 3h 20m  (was: 3h 10m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160761&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160761
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 18:28
Start Date: 30/Oct/18 18:28
Worklog Time Spent: 10m 
  Work Description: kanterov commented on a change in pull request #6888: 
[BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229432459
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only supports 

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160732
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:59
Start Date: 30/Oct/18 16:59
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #6888: [BEAM-5918] [WIP] 
Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434384719
 
 
   Another approach to the problem would be implementing functional-style 
traversals over Schemas and Rows, and implement casting using them. Then they 
can be used to implement parsing of "special" fields if needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160732)
Time Spent: 3h  (was: 2h 50m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160730
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:55
Start Date: 30/Oct/18 16:55
Worklog Time Spent: 10m 
  Work Description: kanterov edited a comment on issue #6888: [BEAM-5918] 
[WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434380987
 
 
   @kennknowles yes, I agree, it's very controversial, but there are cases 
where it makes a lot of sense, for instance, BigQuery exports:
   - BQ:`DATE`, AVRO: `string`
   - BQ:`DATETIME`, AVRO: `string`
   - BQ: `TIMESTAMP`, AVRO: `long`, `logicalType=timestamp-micros`
   
   It needs to be converted to `Row`. The idea is not to have a global registry 
but override it per transform. For instance:
   
   ```java
   Cast
 .to(...)
 .with(StringToDateConversion.of())
 .with(...)
 .build()
 .apply(...)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160730)
Time Spent: 2h 50m  (was: 2h 40m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160728
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:50
Start Date: 30/Oct/18 16:50
Worklog Time Spent: 10m 
  Work Description: kanterov edited a comment on issue #6888: [BEAM-5918] 
[WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434380987
 
 
   @kennknowles yes, I agree, it's very controversial, but there are cases 
where it makes a lot of sense, for instance, BigQuery exports:
   - BQ:`DATE`, AVRO: `string`
   - BQ:`DATETIME`, AVRO: `string`
   - BQ: `TIMESTAMP`, AVRO: `long`, `logicalType=timestamp-micros`
   
   It needs to be converted to `Row`. The idea is not to have a global registry 
but override it per transform. For instance:
   
   ```java
   Cast
 .to(...)
 .register(StringToDateConversion.of())
 .register(...)
 .build()
 .apply(...)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160728)
Time Spent: 2h 40m  (was: 2.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160727&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160727
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:49
Start Date: 30/Oct/18 16:49
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #6888: [BEAM-5918] [WIP] 
Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434380987
 
 
   @kennknowles yes, I agree, it's very controversial, but there are cases 
where it makes a lot of sense, for instance, BigQuery exports:
   - BQ:`DATE`, AVRO: `string`
   - BQ:`DATETIME`, AVRO: `string`
   - BQ: `TIMESTAMP`, AVRO: `long`, `logicalType=timestamp-micros`
   
   The idea is not to have a global registry but override it per transform. For 
instance:
   
   ```java
   Cast
 .to(...)
 .register(StringToDateConversion.of())
 .register(...)
 .build()
 .apply(...)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160727)
Time Spent: 2.5h  (was: 2h 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160724&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160724
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:42
Start Date: 30/Oct/18 16:42
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #6888: [BEAM-5918] [WIP] 
Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434377851
 
 
   I'm slightly more worried about fancy conversions like parsing. It is just a 
bit more of a big design decision. Casting to add/remove nullability or 
narrow/widen integer types is more simple in my mind.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160724)
Time Spent: 2h 20m  (was: 2h 10m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160723&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160723
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:40
Start Date: 30/Oct/18 16:40
Worklog Time Spent: 10m 
  Work Description: kanterov commented on a change in pull request #6888: 
[BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229386957
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only supports 

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160722&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160722
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:37
Start Date: 30/Oct/18 16:37
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229385758
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only suppor

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160720
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:36
Start Date: 30/Oct/18 16:36
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229385262
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only suppor

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160717
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:35
Start Date: 30/Oct/18 16:35
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #6888: [BEAM-5918] [WIP] 
Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434375201
 
 
   @kennknowles thanks! I'm thinking about getting rid of enumerations, and use 
providers instead, something like:
   
   ```
   interface Provider {
 Optional get(ConversionRegistry registry, FieldType inputType, 
FieldType outputType);
   }
   ```
   
   Motivation is to be able to compose "custom" conversions, for instance, 
STRING to DATETIME. As well as in-house types with special behavior that can't 
be put into Beam.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160717)
Time Spent: 1.5h  (was: 1h 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160719
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:35
Start Date: 30/Oct/18 16:35
Worklog Time Spent: 10m 
  Work Description: kanterov edited a comment on issue #6888: [BEAM-5918] 
[WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434375201
 
 
   @kennknowles thanks! I'm thinking about getting rid of enumerations, and use 
providers with registry instead, something like:
   
   ```
   interface Provider {
 Optional get(ConversionRegistry registry, FieldType inputType, 
FieldType outputType);
   }
   ```
   
   Motivation is to be able to compose "custom" conversions, for instance, 
STRING to DATETIME. As well as in-house types with special behavior that can't 
be put into Beam.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160719)
Time Spent: 1h 40m  (was: 1.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160714
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:30
Start Date: 30/Oct/18 16:30
Worklog Time Spent: 10m 
  Work Description: kanterov commented on a change in pull request #6888: 
[BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229382903
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only supports 

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160713
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:28
Start Date: 30/Oct/18 16:28
Worklog Time Spent: 10m 
  Work Description: kanterov commented on a change in pull request #6888: 
[BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229382088
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only supports 

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160706&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160706
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:20
Start Date: 30/Oct/18 16:20
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229376438
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only suppor

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160704
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:20
Start Date: 30/Oct/18 16:20
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229377635
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only suppor

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160707
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:20
Start Date: 30/Oct/18 16:20
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229378175
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only suppor

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160705
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:20
Start Date: 30/Oct/18 16:20
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#6888: [BEAM-5918] [WIP] Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#discussion_r229376846
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java
 ##
 @@ -0,0 +1,582 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.transforms;
+
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP;
+import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Joiner;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/** Set of utilities for casting rows between schemas. */
+@Experimental(Experimental.Kind.SCHEMAS)
+@AutoValue
+public abstract class Cast extends PTransform, 
PCollection> {
+
+  public abstract Schema outputSchema();
+
+  public abstract Nullability nullability();
+
+  public abstract Type type();
+
+  public abstract Shape shape();
+
+  /** Builder for {@link Cast}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+public abstract Builder outputSchema(Schema schema);
+
+public abstract Builder nullability(Nullability nullability);
+
+public abstract Builder type(Type type);
+
+public abstract Builder shape(Shape shape);
+
+public abstract Cast build();
+  }
+
+  public static  Builder builder() {
+return new AutoValue_Cast.Builder();
+  }
+
+  public static  Cast to(Schema outputSchema) {
+return Cast.builder()
+.outputSchema(outputSchema)
+.nullability(Nullability.IGNORE)
+.type(Type.WIDEN)
+.shape(Shape.PROJECTION)
+.build();
+  }
+
+  public List compatibility(Schema inputSchema) {
+return Inference.compatibility(inputSchema, outputSchema(), nullability(), 
type(), shape());
+  }
+
+  public void verifyCompatibility(Schema inputSchema) {
+List errors = compatibility(inputSchema);
+
+if (!errors.isEmpty()) {
+  String reason =
+  errors
+  .stream()
+  .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message())
+  .collect(Collectors.joining("\n\t"));
+
+  throw new IllegalArgumentException("Cast isn't compatible:\n\t" + 
reason);
+}
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+Schema inputSchema = input.getSchema();
+
+verifyCompatibility(inputSchema);
+
+return input
+.apply(
+ParDo.of(
+new DoFn() {
+  // TODO: This should be the same as resolved so that Beam 
knows which fields
+  // are being accessed. Currently Beam only suppor

[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160703&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160703
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:15
Start Date: 30/Oct/18 16:15
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #6888: [BEAM-5918] [WIP] 
Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434366781
 
 
   I want to redo part of resolving how to convert schemas.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160703)
Time Spent: 40m  (was: 0.5h)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160695&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160695
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:06
Start Date: 30/Oct/18 16:06
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #6888: [BEAM-5918] Add Cast 
transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434363483
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160695)
Time Spent: 20m  (was: 10m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160696&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160696
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 16:07
Start Date: 30/Oct/18 16:07
Worklog Time Spent: 10m 
  Work Description: kanterov removed a comment on issue #6888: [BEAM-5918] 
Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888#issuecomment-434363483
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 160696)
Time Spent: 0.5h  (was: 20m)

> Add Cast transform for Rows
> ---
>
> Key: BEAM-5918
> URL: https://issues.apache.org/jira/browse/BEAM-5918
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is a need for a generic transform that given two Row schemas will 
> convert rows between them. There must be a possibility to opt-out from 
> certain kind of conversions, for instance, converting ints to shorts can 
> cause overflow. Another example, a schema could have a nullable field, but 
> never have NULL value in practice, because it was filtered out.
> What is needed:
> - widening values (e.g., int -> long)
> - narrowwing (e.g., int -> short)
> - runtime check for overflow while narrowing
> - ignoring nullability (nullable=true -> nullable=false)
> - weakening nullability (nullable=false -> nullable=true)
> - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows

2018-10-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160685
 ]

ASF GitHub Bot logged work on BEAM-5918:


Author: ASF GitHub Bot
Created on: 30/Oct/18 15:47
Start Date: 30/Oct/18 15:47
Worklog Time Spent: 10m 
  Work Description: kanterov opened a new pull request #6888: [BEAM-5918] 
Add Cast transform for Rows
URL: https://github.com/apache/beam/pull/6888
 
 
   Casts rows from one schema, into another. Implements:
   - widening values (e.g., int -> long), to be extended with more conversions
   - narrowwing (e.g., int -> short), to be extended with more conversions
   - ignoring nullability (nullable=true -> nullable=false)
   - weakening nullability (nullable=false -> nullable=true)
   - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32))
   
   It would be very useful for Row-based IO-s, for instance, BeamBigQueryTable 
can be implemented with org.apache.beam.sdk.schemas.utils.AvroUtils and Cast, 
and this will make it more flexible, now it's very restrictive to the schema.
   
   Another example is reading AVRO GenericRecord as user-provided POJO, 
[BEAM-5807](https://issues.apache.org/jira/browse/BEAM-5807).
   
   I want to get an initial port of feedback before polishing Javadoc, API, etc.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.a