[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184339&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184339 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 11/Jan/19 20:33 Start Date: 11/Jan/19 20:33 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184339) Time Spent: 9h 40m (was: 9.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184183 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 11/Jan/19 12:31 Start Date: 11/Jan/19 12:31 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373#issuecomment-453502723 @kennknowles thanks for the review! I fixed the code and rebased This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184183) Time Spent: 9.5h (was: 9h 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 9.5h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184147 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 11/Jan/19 10:18 Start Date: 11/Jan/19 10:18 Worklog Time Spent: 10m Work Description: kanterov commented on pull request #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373#discussion_r247065259 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastTest.java ## @@ -36,137 +32,252 @@ import org.junit.Rule; import org.junit.Test; import org.junit.experimental.categories.Category; +import org.junit.rules.ExpectedException; /** Tests for {@link Cast}. */ public class CastTest { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); + @Rule public transient ExpectedException expectedException = ExpectedException.none(); @Test @Category(NeedsRunner.class) - public void testProjection() throws Exception { -Schema outputSchema = pipeline.getSchemaRegistry().getSchema(Projection2.class); -PCollection pojos = + public void testProjection() { +Schema inputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT16), +Schema.Field.of("f1", Schema.FieldType.INT32), +Schema.Field.of("f2", Schema.FieldType.STRING)); + +// remove f0 and reorder f1 and f2 +Schema outputSchema = +Schema.of( +Schema.Field.of("f2", Schema.FieldType.STRING), +Schema.Field.of("f1", Schema.FieldType.INT32)); + +Row input = Row.withSchema(inputSchema).addValues((short) 1, 2, "3").build(); +Row expected = Row.withSchema(outputSchema).addValues("3", 2).build(); + +PCollection output = pipeline -.apply(Create.of(new Projection1())) -.apply(Cast.widening(outputSchema)) -.apply(Convert.to(Projection2.class)); +.apply(Create.of(input).withRowSchema(inputSchema)) +.apply(Cast.widening(outputSchema)); + +PAssert.that(output).containsInAnyOrder(expected); -PAssert.that(pojos).containsInAnyOrder(new Projection2()); pipeline.run(); } @Test @Category(NeedsRunner.class) - public void testTypeWiden() throws Exception { -Schema outputSchema = pipeline.getSchemaRegistry().getSchema(TypeWiden2.class); + public void testTypeWiden() { +Schema inputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT16), +Schema.Field.of("f1", Schema.FieldType.INT32)); + +Schema outputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT32), +Schema.Field.of("f1", Schema.FieldType.INT64)); -PCollection pojos = +Row input = Row.withSchema(inputSchema).addValues((short) 1, 2).build(); +Row expected = Row.withSchema(outputSchema).addValues(1, 2L).build(); + +PCollection output = pipeline -.apply(Create.of(new TypeWiden1())) -.apply(Cast.widening(outputSchema)) -.apply(Convert.to(TypeWiden2.class)); +.apply(Create.of(input).withRowSchema(inputSchema)) +.apply(Cast.widening(outputSchema)); + +PAssert.that(output).containsInAnyOrder(expected); -PAssert.that(pojos).containsInAnyOrder(new TypeWiden2()); pipeline.run(); } @Test - @Category(NeedsRunner.class) - public void testTypeNarrow() throws Exception { -// narrowing is the opposite of widening -Schema outputSchema = pipeline.getSchemaRegistry().getSchema(TypeWiden1.class); + public void testTypeWidenFail() { +Schema inputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT16), +Schema.Field.of("f1", Schema.FieldType.INT64)); -PCollection pojos = -pipeline -.apply(Create.of(new TypeWiden2())) -.apply(Cast.narrowing(outputSchema)) -.apply(Convert.to(TypeWiden1.class)); +Schema outputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT32), +Schema.Field.of("f1", Schema.FieldType.INT32)); -PAssert.that(pojos).containsInAnyOrder(new TypeWiden1()); -pipeline.run(); - } +expectedException.expect(IllegalArgumentException.class); +expectedException.expectMessage(containsString("f1: Can't cast 'INT64' to 'INT32'")); - @Test(expected = IllegalArgumentException.class) - @Category(NeedsRunner.class) - public void testTypeNarrowFail() throws Exception { -// narrowing is the opposite of widening -Schema inputSchema = pipeline.getSchemaRegistry().getSchema(TypeWiden2.class); -Schema outputSchema = pipeline.getSchemaRegistry().getSchema(TypeWiden1.class); - -C
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184094 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 11/Jan/19 04:42 Start Date: 11/Jan/19 04:42 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373#issuecomment-453376610 LGTM if my comment is wrong. Nice code & tests, again. There's some conflict to be resolved. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 184094) Time Spent: 9h 10m (was: 9h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=184093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184093 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 11/Jan/19 04:41 Start Date: 11/Jan/19 04:41 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373#discussion_r247002100 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastTest.java ## @@ -36,137 +32,252 @@ import org.junit.Rule; import org.junit.Test; import org.junit.experimental.categories.Category; +import org.junit.rules.ExpectedException; /** Tests for {@link Cast}. */ public class CastTest { @Rule public final transient TestPipeline pipeline = TestPipeline.create(); + @Rule public transient ExpectedException expectedException = ExpectedException.none(); @Test @Category(NeedsRunner.class) - public void testProjection() throws Exception { -Schema outputSchema = pipeline.getSchemaRegistry().getSchema(Projection2.class); -PCollection pojos = + public void testProjection() { +Schema inputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT16), +Schema.Field.of("f1", Schema.FieldType.INT32), +Schema.Field.of("f2", Schema.FieldType.STRING)); + +// remove f0 and reorder f1 and f2 +Schema outputSchema = +Schema.of( +Schema.Field.of("f2", Schema.FieldType.STRING), +Schema.Field.of("f1", Schema.FieldType.INT32)); + +Row input = Row.withSchema(inputSchema).addValues((short) 1, 2, "3").build(); +Row expected = Row.withSchema(outputSchema).addValues("3", 2).build(); + +PCollection output = pipeline -.apply(Create.of(new Projection1())) -.apply(Cast.widening(outputSchema)) -.apply(Convert.to(Projection2.class)); +.apply(Create.of(input).withRowSchema(inputSchema)) +.apply(Cast.widening(outputSchema)); + +PAssert.that(output).containsInAnyOrder(expected); -PAssert.that(pojos).containsInAnyOrder(new Projection2()); pipeline.run(); } @Test @Category(NeedsRunner.class) - public void testTypeWiden() throws Exception { -Schema outputSchema = pipeline.getSchemaRegistry().getSchema(TypeWiden2.class); + public void testTypeWiden() { +Schema inputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT16), +Schema.Field.of("f1", Schema.FieldType.INT32)); + +Schema outputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT32), +Schema.Field.of("f1", Schema.FieldType.INT64)); -PCollection pojos = +Row input = Row.withSchema(inputSchema).addValues((short) 1, 2).build(); +Row expected = Row.withSchema(outputSchema).addValues(1, 2L).build(); + +PCollection output = pipeline -.apply(Create.of(new TypeWiden1())) -.apply(Cast.widening(outputSchema)) -.apply(Convert.to(TypeWiden2.class)); +.apply(Create.of(input).withRowSchema(inputSchema)) +.apply(Cast.widening(outputSchema)); + +PAssert.that(output).containsInAnyOrder(expected); -PAssert.that(pojos).containsInAnyOrder(new TypeWiden2()); pipeline.run(); } @Test - @Category(NeedsRunner.class) - public void testTypeNarrow() throws Exception { -// narrowing is the opposite of widening -Schema outputSchema = pipeline.getSchemaRegistry().getSchema(TypeWiden1.class); + public void testTypeWidenFail() { +Schema inputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT16), +Schema.Field.of("f1", Schema.FieldType.INT64)); -PCollection pojos = -pipeline -.apply(Create.of(new TypeWiden2())) -.apply(Cast.narrowing(outputSchema)) -.apply(Convert.to(TypeWiden1.class)); +Schema outputSchema = +Schema.of( +Schema.Field.of("f0", Schema.FieldType.INT32), +Schema.Field.of("f1", Schema.FieldType.INT32)); -PAssert.that(pojos).containsInAnyOrder(new TypeWiden1()); -pipeline.run(); - } +expectedException.expect(IllegalArgumentException.class); +expectedException.expectMessage(containsString("f1: Can't cast 'INT64' to 'INT32'")); - @Test(expected = IllegalArgumentException.class) - @Category(NeedsRunner.class) - public void testTypeNarrowFail() throws Exception { -// narrowing is the opposite of widening -Schema inputSchema = pipeline.getSchemaRegistry().getSchema(TypeWiden2.class); -Schema outputSchema = pipeline.getSchemaRegistry().getSchema(TypeWiden1.class); - -
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=183189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-183189 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 09/Jan/19 17:33 Start Date: 09/Jan/19 17:33 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373#issuecomment-452782670 @kennknowles did you have a chance to take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 183189) Time Spent: 8h 50m (was: 8h 40m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=180468&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-180468 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 03/Jan/19 00:31 Start Date: 03/Jan/19 00:31 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373#issuecomment-451027580 Ah, sorry. Processing large post-holiday inbox too quickly and shallowly. I'll take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 180468) Time Spent: 8h 40m (was: 8.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=180465&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-180465 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 03/Jan/19 00:06 Start Date: 03/Jan/19 00:06 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373#issuecomment-451023769 @kennknowles no, it isn't connected to Beam SQL https://github.com/apache/beam/pull/6417, it's transform from core This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 180465) Time Spent: 8.5h (was: 8h 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=180381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-180381 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 02/Jan/19 19:40 Start Date: 02/Jan/19 19:40 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373#issuecomment-450963267 Is this preempted by #6417 ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 180381) Time Spent: 8h 20m (was: 8h 10m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=180227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-180227 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 02/Jan/19 16:23 Start Date: 02/Jan/19 16:23 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373#issuecomment-450909373 R: @kennknowles This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 180227) Time Spent: 8h 10m (was: 8h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179516 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 16:12 Start Date: 28/Dec/18 16:12 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #7372: [BEAM-5918] Fix CastTest URL: https://github.com/apache/beam/pull/7372#issuecomment-450384079 lgtm This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179516) Time Spent: 8h (was: 7h 50m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179515 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 16:12 Start Date: 28/Dec/18 16:12 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7372: [BEAM-5918] Fix CastTest URL: https://github.com/apache/beam/pull/7372 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179515) Time Spent: 7h 50m (was: 7h 40m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179504&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179504 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 15:37 Start Date: 28/Dec/18 15:37 Worklog Time Spent: 10m Work Description: kanterov commented on pull request #7373: [BEAM-5918] Fix casting of non-numeric types URL: https://github.com/apache/beam/pull/7373 Continuation of https://github.com/apache/beam/pull/7372. Refactoring of tests, and fixing casting of non-numeric types that wasn't checked before. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179504) Time Spent: 7h 40m (was: 7.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam >
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179500&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179500 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 15:29 Start Date: 28/Dec/18 15:29 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7372: [BEAM-5918] Fix CastTest URL: https://github.com/apache/beam/pull/7372#issuecomment-450377111 @reuvenlax yes, it's hard to notice because it's using POJOs that are declared few screens later. I did one more PR (+150 -350) to refactor and use explicit rows and schemas, so each test will fit one screen. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179500) Time Spent: 7.5h (was: 7h 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179496&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179496 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 15:27 Start Date: 28/Dec/18 15:27 Worklog Time Spent: 10m Work Description: kanterov commented on pull request #7372: [BEAM-5918] Fix CastTest URL: https://github.com/apache/beam/pull/7372#discussion_r244344695 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/CastTest.java ## @@ -112,15 +118,6 @@ public void testWeakedNullable() throws Exception { pipeline.run(); } - @Test(expected = IllegalArgumentException.class) - @Category(NeedsRunner.class) - public void testWeakedNullableFail() throws Exception { -Schema inputSchema = pipeline.getSchemaRegistry().getSchema(Nullable1.class); -Schema outputSchema = pipeline.getSchemaRegistry().getSchema(Nullable2.class); - -Cast.widening(outputSchema).verifyCompatibility(inputSchema); Review comment: It's valid to cast from ``` public static class Nullable1 { public Integer field1 = 42; public @Nullable Long field2 = null; ``` to ``` public static class Nullable2 { public @Nullable Integer field1 = 42; public @Nullable Long field2 = null; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179496) Time Spent: 7h 20m (was: 7h 10m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179492 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 15:16 Start Date: 28/Dec/18 15:16 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #7372: [BEAM-5918] Fix CastTest URL: https://github.com/apache/beam/pull/7372#issuecomment-450375048 Question: was the test you removed simply incorrect? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179492) Time Spent: 7h 10m (was: 7h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179489&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179489 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 15:07 Start Date: 28/Dec/18 15:07 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7372: [BEAM-5918] Fix CastTest URL: https://github.com/apache/beam/pull/7372#issuecomment-450373464 R: @kennknowles @reuvenlax Please take a look, fixes one of failing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179489) Time Spent: 7h (was: 6h 50m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179473 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 14:44 Start Date: 28/Dec/18 14:44 Worklog Time Spent: 10m Work Description: kanterov commented on pull request #7372: [BEAM-5918] Fix CastTest URL: https://github.com/apache/beam/pull/7372 Right now `CastTest` is failing, the build is green because runner tests aren't part of Java Run PreCommit. If we want to enable runner tests, it would be better to fix failing tests first. I have PR with a bigger refactor, however, I want to start small, and fix failing tests first. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179473) Time Spent: 6h 50m (was: 6h 40m) > Add Cast transform for Rows > --- > >
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179425&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179425 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 12:02 Start Date: 28/Dec/18 12:02 Worklog Time Spent: 10m Work Description: kanterov commented on pull request #7363: [BEAM-5918] [WIP] Fix CastTest URL: https://github.com/apache/beam/pull/7363 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179425) Time Spent: 6h 40m (was: 6.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179398 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 09:44 Start Date: 28/Dec/18 09:44 Worklog Time Spent: 10m Work Description: kanterov commented on pull request #7363: [BEAM-5918] [WIP] Fix CastTest URL: https://github.com/apache/beam/pull/7363 Split into needsRunner and unit tests, otherwise, they don't run. **Please** add a meaningful description for your change here Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179398) Time Spent: 6.5h (was: 6h 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement >
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179396 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 09:44 Start Date: 28/Dec/18 09:44 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7363: [BEAM-5918] [WIP] Fix CastTest URL: https://github.com/apache/beam/pull/7363#issuecomment-450329135 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179396) Time Spent: 6h 10m (was: 6h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179397&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179397 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 28/Dec/18 09:44 Start Date: 28/Dec/18 09:44 Worklog Time Spent: 10m Work Description: kanterov commented on pull request #7363: [BEAM-5918] [WIP] Fix CastTest URL: https://github.com/apache/beam/pull/7363 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179397) Time Spent: 6h 20m (was: 6h 10m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179218 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 27/Dec/18 20:20 Start Date: 27/Dec/18 20:20 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7363: [BEAM-5918] [WIP] Fix CastTest URL: https://github.com/apache/beam/pull/7363#issuecomment-450224241 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179218) Time Spent: 6h (was: 5h 50m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179205 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 27/Dec/18 19:42 Start Date: 27/Dec/18 19:42 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7363: [BEAM-5918] [WIP] Fix CastTest URL: https://github.com/apache/beam/pull/7363#issuecomment-450217716 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179205) Time Spent: 5h 40m (was: 5.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179208 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 27/Dec/18 19:49 Start Date: 27/Dec/18 19:49 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7363: [BEAM-5918] [WIP] Fix CastTest URL: https://github.com/apache/beam/pull/7363#issuecomment-450217716 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179208) Time Spent: 5h 50m (was: 5h 40m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=179176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-179176 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 27/Dec/18 18:36 Start Date: 27/Dec/18 18:36 Worklog Time Spent: 10m Work Description: kanterov commented on pull request #7363: [BEAM-5918] [WIP] Fix CastTest URL: https://github.com/apache/beam/pull/7363 Split into needsRunner and unit tests, otherwise, they don't run. **Please** add a meaningful description for your change here Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 179176) Time Spent: 5.5h (was: 5h 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement >
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162738 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 20:12 Start Date: 05/Nov/18 20:12 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-436018476 Ah, sorry I neglectected the squashing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 162738) Time Spent: 5h 20m (was: 5h 10m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162736 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 20:11 Start Date: 05/Nov/18 20:11 Worklog Time Spent: 10m Work Description: kennknowles closed pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java index 86a0f4653d5..1587a6bbee7 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java @@ -292,7 +292,7 @@ public int hashCode() { INT16, // two-byte signed integer. INT32, // four-byte signed integer. INT64, // eight-byte signed integer. -DECIMAL, // Decimal integer +DECIMAL, // Arbitrary-precision decimal number FLOAT, DOUBLE, STRING, // String. @@ -338,6 +338,47 @@ public boolean isMapType() { public boolean isCompositeType() { return COMPOSITE_TYPES.contains(this); } + +public boolean isSubtypeOf(TypeName other) { + return other.isSupertypeOf(this); +} + +public boolean isSupertypeOf(TypeName other) { + if (this == other) { +return true; + } + + // defined only for numeric types + if (!isNumericType() || !other.isNumericType()) { +return false; + } + + switch (this) { +case BYTE: + return false; + +case INT16: + return other == BYTE; + +case INT32: + return other == BYTE || other == INT16; + +case INT64: + return other == BYTE || other == INT16 || other == INT32; + +case FLOAT: + return false; + +case DOUBLE: + return other == FLOAT; + +case DECIMAL: + return other == FLOAT || other == DOUBLE; + +default: + throw new AssertionError("Unexpected numeric type: " + this); + } +} } /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java new file mode 100644 index 000..3048806edf0 --- /dev/null +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java @@ -0,0 +1,440 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import com.google.auto.value.AutoValue; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Maps; +import java.io.Serializable; +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.utils.SchemaZipFold; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema()
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162737 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 20:11 Start Date: 05/Nov/18 20:11 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-436018354 Merged, but if you had other changes in progress, just open another PR from the branch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 162737) Time Spent: 5h 10m (was: 5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162705 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 19:07 Start Date: 05/Nov/18 19:07 Worklog Time Spent: 10m Work Description: kanterov commented on a change in pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r230873757 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,440 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import com.google.auto.value.AutoValue; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Maps; +import java.io.Serializable; +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.utils.SchemaZipFold; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Validator validator(); + + public static Cast of(Schema outputSchema, Validator validator) { +return new AutoValue_Cast<>(outputSchema, validator); + } + + public static Cast widening(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Widening.of()); + } + + public static Cast narrowing(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Narrowing.of()); + } + + /** Describes compatibility errors during casting. */ + @AutoValue + public abstract static class CompatibilityError implements Serializable { + +public abstract List path(); + +public abstract String message(); + +public static CompatibilityError create(List path, String message) { + return new AutoValue_Cast_CompatibilityError(path, message); +} + } + + /** Interface for statically validating casts. */ + public interface Validator extends Serializable { +List apply(Schema input, Schema output); + } + + /** + * Widening changes to type that can represent any possible value of the original type. + * + * Standard widening conversions: + * + * + * BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT32 to INT64, FLOAT, DOUBLE, DECIMAL + * INT64 to FLOAT, DOUBLE, DECIMAL + * FLOAT to DOUBLE, DECIMAL + * DOUBLE to DECIMAL + * + * + * Row widening: + * + * + * wider schema to schema with a subset of fields + * non-nullable fields to nullable fields + * + * + * Widening doesn't lose information about the overall magnitude in following cases: + * + * + * integral type to another integral type + * BYTE or INT16 to FLOAT, DOUBLE or DECIMAL + * INT32 to DOUBLE + * + * + * Other conversions to may cause loss of precision. + */ + public static class Widening implements Validator { +private final Fold fold = new Fold(); + +public static Widening of() { + return new Widening(); +} + +@Override +public String toString() { + return "Cast.Widening"; +} + +@Override +public List apply(final Schema input, f
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162697&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162697 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 18:52 Start Date: 05/Nov/18 18:52 Worklog Time Spent: 10m Work Description: kanterov commented on a change in pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r230868788 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SchemaZipFold.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.utils; + +import com.google.auto.value.AutoValue; +import com.google.common.collect.ImmutableList; +import java.io.Serializable; +import java.util.Collections; +import java.util.List; +import java.util.Optional; +import java.util.stream.Stream; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; + +/** + * Visitor that zips schemas, and accepts pairs of fields and their types. + * + * Values returned by `accept` are accumulated. + */ +public abstract class SchemaZipFold implements Serializable { Review comment: It isn't optimized to have few allocations or be fast, because it's called a couple of times during graph construction, and, in my understanding, performance isn't a concern here. The reason why I extracted zip and fold is that otherwise, it's hard to see the actual narrowing/widening logic. I agree that it is a bit out of the rest of codebase. One alternative could be creating something like `class ZippedSchemas` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 162697) Time Spent: 4h 40m (was: 4.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162655&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162655 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 18:24 Start Date: 05/Nov/18 18:24 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-435980683 I will wait a short while if you want to make any last changes like touching up the javadoc, then I will go ahead and merge and we can do it in follow-up smaller PRs. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 162655) Time Spent: 4.5h (was: 4h 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162653&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162653 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 18:23 Start Date: 05/Nov/18 18:23 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r230854619 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java ## @@ -338,6 +338,47 @@ public boolean isMapType() { public boolean isCompositeType() { return COMPOSITE_TYPES.contains(this); } + +public boolean isSubtypeOf(TypeName other) { + return other.isSupertypeOf(this); +} + +public boolean isSupertypeOf(TypeName other) { Review comment: In #6861 nullability is added for array elements and map values. It isn't expressed in the most natural "type system" way, but we should move towards treating a nullable `T` as `OPTIONAL` with automatic coercion to `T` rather than treating it as just a `T` with nullability as a side condition. It requires a fairly significant refactor to do so. Just something to keep in mind. It would affect this sub/supertype check. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 162653) Time Spent: 4h 20m (was: 4h 10m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162651 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 18:23 Start Date: 05/Nov/18 18:23 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r230858306 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/SchemaZipFold.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.utils; + +import com.google.auto.value.AutoValue; +import com.google.common.collect.ImmutableList; +import java.io.Serializable; +import java.util.Collections; +import java.util.List; +import java.util.Optional; +import java.util.stream.Stream; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; + +/** + * Visitor that zips schemas, and accepts pairs of fields and their types. + * + * Values returned by `accept` are accumulated. + */ +public abstract class SchemaZipFold implements Serializable { Review comment: I understand this class, but maybe others won't love it vs nested switch statements / static recursive functions. Out of curiosity does it not add some allocation cost? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 162651) Time Spent: 4h 10m (was: 4h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162650 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 18:23 Start Date: 05/Nov/18 18:23 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r230857351 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,440 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import com.google.auto.value.AutoValue; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Maps; +import java.io.Serializable; +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.utils.SchemaZipFold; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Validator validator(); + + public static Cast of(Schema outputSchema, Validator validator) { +return new AutoValue_Cast<>(outputSchema, validator); + } + + public static Cast widening(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Widening.of()); + } + + public static Cast narrowing(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Narrowing.of()); + } + + /** Describes compatibility errors during casting. */ + @AutoValue + public abstract static class CompatibilityError implements Serializable { + +public abstract List path(); + +public abstract String message(); + +public static CompatibilityError create(List path, String message) { + return new AutoValue_Cast_CompatibilityError(path, message); +} + } + + /** Interface for statically validating casts. */ + public interface Validator extends Serializable { +List apply(Schema input, Schema output); + } + + /** + * Widening changes to type that can represent any possible value of the original type. + * + * Standard widening conversions: + * + * + * BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT32 to INT64, FLOAT, DOUBLE, DECIMAL + * INT64 to FLOAT, DOUBLE, DECIMAL + * FLOAT to DOUBLE, DECIMAL + * DOUBLE to DECIMAL + * + * + * Row widening: + * + * + * wider schema to schema with a subset of fields + * non-nullable fields to nullable fields + * + * + * Widening doesn't lose information about the overall magnitude in following cases: + * + * + * integral type to another integral type + * BYTE or INT16 to FLOAT, DOUBLE or DECIMAL + * INT32 to DOUBLE Review comment: What about INT64? And why only some can go to DECIMAL? And adding nullability is widening too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact I
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162649 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 18:23 Start Date: 05/Nov/18 18:23 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r230857163 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,440 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import com.google.auto.value.AutoValue; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Maps; +import java.io.Serializable; +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.utils.SchemaZipFold; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Validator validator(); + + public static Cast of(Schema outputSchema, Validator validator) { +return new AutoValue_Cast<>(outputSchema, validator); + } + + public static Cast widening(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Widening.of()); + } + + public static Cast narrowing(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Narrowing.of()); + } + + /** Describes compatibility errors during casting. */ + @AutoValue + public abstract static class CompatibilityError implements Serializable { + +public abstract List path(); + +public abstract String message(); + +public static CompatibilityError create(List path, String message) { + return new AutoValue_Cast_CompatibilityError(path, message); +} + } + + /** Interface for statically validating casts. */ + public interface Validator extends Serializable { +List apply(Schema input, Schema output); + } + + /** + * Widening changes to type that can represent any possible value of the original type. + * + * Standard widening conversions: + * + * + * BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT32 to INT64, FLOAT, DOUBLE, DECIMAL + * INT64 to FLOAT, DOUBLE, DECIMAL + * FLOAT to DOUBLE, DECIMAL + * DOUBLE to DECIMAL + * + * + * Row widening: + * + * + * wider schema to schema with a subset of fields + * non-nullable fields to nullable fields + * + * + * Widening doesn't lose information about the overall magnitude in following cases: + * + * + * integral type to another integral type + * BYTE or INT16 to FLOAT, DOUBLE or DECIMAL + * INT32 to DOUBLE + * + * + * Other conversions to may cause loss of precision. + */ + public static class Widening implements Validator { +private final Fold fold = new Fold(); + +public static Widening of() { + return new Widening(); +} + +@Override +public String toString() { + return "Cast.Widening"; +} + +@Override +public List apply(final Schema input
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162654&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162654 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 18:23 Start Date: 05/Nov/18 18:23 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r230857029 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,440 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import com.google.auto.value.AutoValue; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Maps; +import java.io.Serializable; +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.utils.SchemaZipFold; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Validator validator(); + + public static Cast of(Schema outputSchema, Validator validator) { +return new AutoValue_Cast<>(outputSchema, validator); + } + + public static Cast widening(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Widening.of()); + } + + public static Cast narrowing(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Narrowing.of()); + } + + /** Describes compatibility errors during casting. */ + @AutoValue + public abstract static class CompatibilityError implements Serializable { + +public abstract List path(); + +public abstract String message(); + +public static CompatibilityError create(List path, String message) { + return new AutoValue_Cast_CompatibilityError(path, message); +} + } + + /** Interface for statically validating casts. */ + public interface Validator extends Serializable { +List apply(Schema input, Schema output); + } + + /** + * Widening changes to type that can represent any possible value of the original type. + * + * Standard widening conversions: + * + * + * BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT32 to INT64, FLOAT, DOUBLE, DECIMAL + * INT64 to FLOAT, DOUBLE, DECIMAL + * FLOAT to DOUBLE, DECIMAL + * DOUBLE to DECIMAL + * + * + * Row widening: + * + * + * wider schema to schema with a subset of fields + * non-nullable fields to nullable fields + * + * + * Widening doesn't lose information about the overall magnitude in following cases: + * + * + * integral type to another integral type + * BYTE or INT16 to FLOAT, DOUBLE or DECIMAL + * INT32 to DOUBLE + * + * + * Other conversions to may cause loss of precision. + */ + public static class Widening implements Validator { +private final Fold fold = new Fold(); + +public static Widening of() { + return new Widening(); +} + +@Override +public String toString() { + return "Cast.Widening"; +} + +@Override +public List apply(final Schema input
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162652 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 18:23 Start Date: 05/Nov/18 18:23 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r230857941 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,440 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import com.google.auto.value.AutoValue; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Maps; +import java.io.Serializable; +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.utils.SchemaZipFold; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Validator validator(); + + public static Cast of(Schema outputSchema, Validator validator) { +return new AutoValue_Cast<>(outputSchema, validator); + } + + public static Cast widening(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Widening.of()); + } + + public static Cast narrowing(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Narrowing.of()); + } + + /** Describes compatibility errors during casting. */ + @AutoValue + public abstract static class CompatibilityError implements Serializable { + +public abstract List path(); + +public abstract String message(); + +public static CompatibilityError create(List path, String message) { + return new AutoValue_Cast_CompatibilityError(path, message); +} + } + + /** Interface for statically validating casts. */ + public interface Validator extends Serializable { +List apply(Schema input, Schema output); + } + + /** + * Widening changes to type that can represent any possible value of the original type. + * + * Standard widening conversions: + * + * + * BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT32 to INT64, FLOAT, DOUBLE, DECIMAL + * INT64 to FLOAT, DOUBLE, DECIMAL + * FLOAT to DOUBLE, DECIMAL + * DOUBLE to DECIMAL + * + * + * Row widening: + * + * + * wider schema to schema with a subset of fields + * non-nullable fields to nullable fields + * + * + * Widening doesn't lose information about the overall magnitude in following cases: + * + * + * integral type to another integral type + * BYTE or INT16 to FLOAT, DOUBLE or DECIMAL + * INT32 to DOUBLE + * + * + * Other conversions to may cause loss of precision. + */ + public static class Widening implements Validator { +private final Fold fold = new Fold(); + +public static Widening of() { + return new Widening(); +} + +@Override +public String toString() { + return "Cast.Widening"; +} + +@Override +public List apply(final Schema input
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162638 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 18:12 Start Date: 05/Nov/18 18:12 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-435976579 Sorry for the delay - taking another look. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 162638) Time Spent: 3h 50m (was: 3h 40m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=162615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-162615 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 05/Nov/18 17:46 Start Date: 05/Nov/18 17:46 Worklog Time Spent: 10m Work Description: kanterov commented on issue #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-435967943 @kennknowles Gentle ping, or, perhaps, there is somebody else who can help with the review? @akedin This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 162615) Time Spent: 3h 40m (was: 3.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=161320&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-161320 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 31/Oct/18 21:21 Start Date: 31/Oct/18 21:21 Worklog Time Spent: 10m Work Description: kanterov commented on a change in pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229872246 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,440 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import com.google.auto.value.AutoValue; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.Maps; +import java.io.Serializable; +import java.math.BigDecimal; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.schemas.utils.SchemaZipFold; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Validator validator(); + + public static Cast of(Schema outputSchema, Validator validator) { +return new AutoValue_Cast<>(outputSchema, validator); + } + + public static Cast widening(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Widening.of()); + } + + public static Cast narrowing(Schema outputSchema) { +return new AutoValue_Cast<>(outputSchema, Narrowing.of()); + } + + /** Describes compatibility errors during casting. */ + @AutoValue + public abstract static class CompatibilityError implements Serializable { + +public abstract List path(); + +public abstract String message(); + +public static CompatibilityError create(List path, String message) { + return new AutoValue_Cast_CompatibilityError(path, message); +} + } + + /** Interface for statically validating casts. */ + public interface Validator extends Serializable { +List apply(Schema input, Schema output); + } + + /** + * Widening changes to type that can represent any possible value of the original type. + * + * Standard widening conversions: + * + * + * BYTE to INT16, INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT16 to INT32, INT64, FLOAT, DOUBLE, DECIMAL + * INT32 to INT64, FLOAT, DOUBLE, DECIMAL + * INT64 to FLOAT, DOUBLE, DECIMAL + * FLOAT to DOUBLE, DECIMAL + * DOUBLE to DECIMAL + * + * + * Row widening: + * + * + * wider schema to schema with a subset of fields Review comment: Doesn't match with the definition of widening, probably should be only in narrowing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 161320) Time Spent: 3.5h (was: 3h 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.o
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=161296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-161296 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 31/Oct/18 20:17 Start Date: 31/Oct/18 20:17 Worklog Time Spent: 10m Work Description: kanterov commented on issue #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434832598 @kennknowles Thanks for the feedback. I've simplified implementation a lot. Please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 161296) Time Spent: 3h 20m (was: 3h 10m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160761&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160761 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 18:28 Start Date: 30/Oct/18 18:28 Worklog Time Spent: 10m Work Description: kanterov commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229432459 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only supports
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160732 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:59 Start Date: 30/Oct/18 16:59 Worklog Time Spent: 10m Work Description: kanterov commented on issue #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434384719 Another approach to the problem would be implementing functional-style traversals over Schemas and Rows, and implement casting using them. Then they can be used to implement parsing of "special" fields if needed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160732) Time Spent: 3h (was: 2h 50m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160730 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:55 Start Date: 30/Oct/18 16:55 Worklog Time Spent: 10m Work Description: kanterov edited a comment on issue #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434380987 @kennknowles yes, I agree, it's very controversial, but there are cases where it makes a lot of sense, for instance, BigQuery exports: - BQ:`DATE`, AVRO: `string` - BQ:`DATETIME`, AVRO: `string` - BQ: `TIMESTAMP`, AVRO: `long`, `logicalType=timestamp-micros` It needs to be converted to `Row`. The idea is not to have a global registry but override it per transform. For instance: ```java Cast .to(...) .with(StringToDateConversion.of()) .with(...) .build() .apply(...) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160730) Time Spent: 2h 50m (was: 2h 40m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160728 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:50 Start Date: 30/Oct/18 16:50 Worklog Time Spent: 10m Work Description: kanterov edited a comment on issue #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434380987 @kennknowles yes, I agree, it's very controversial, but there are cases where it makes a lot of sense, for instance, BigQuery exports: - BQ:`DATE`, AVRO: `string` - BQ:`DATETIME`, AVRO: `string` - BQ: `TIMESTAMP`, AVRO: `long`, `logicalType=timestamp-micros` It needs to be converted to `Row`. The idea is not to have a global registry but override it per transform. For instance: ```java Cast .to(...) .register(StringToDateConversion.of()) .register(...) .build() .apply(...) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160728) Time Spent: 2h 40m (was: 2.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160727&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160727 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:49 Start Date: 30/Oct/18 16:49 Worklog Time Spent: 10m Work Description: kanterov commented on issue #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434380987 @kennknowles yes, I agree, it's very controversial, but there are cases where it makes a lot of sense, for instance, BigQuery exports: - BQ:`DATE`, AVRO: `string` - BQ:`DATETIME`, AVRO: `string` - BQ: `TIMESTAMP`, AVRO: `long`, `logicalType=timestamp-micros` The idea is not to have a global registry but override it per transform. For instance: ```java Cast .to(...) .register(StringToDateConversion.of()) .register(...) .build() .apply(...) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160727) Time Spent: 2.5h (was: 2h 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160724&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160724 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:42 Start Date: 30/Oct/18 16:42 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434377851 I'm slightly more worried about fancy conversions like parsing. It is just a bit more of a big design decision. Casting to add/remove nullability or narrow/widen integer types is more simple in my mind. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160724) Time Spent: 2h 20m (was: 2h 10m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160723&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160723 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:40 Start Date: 30/Oct/18 16:40 Worklog Time Spent: 10m Work Description: kanterov commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229386957 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only supports
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160722&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160722 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:37 Start Date: 30/Oct/18 16:37 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229385758 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only suppor
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160720 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:36 Start Date: 30/Oct/18 16:36 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229385262 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only suppor
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160717 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:35 Start Date: 30/Oct/18 16:35 Worklog Time Spent: 10m Work Description: kanterov commented on issue #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434375201 @kennknowles thanks! I'm thinking about getting rid of enumerations, and use providers instead, something like: ``` interface Provider { Optional get(ConversionRegistry registry, FieldType inputType, FieldType outputType); } ``` Motivation is to be able to compose "custom" conversions, for instance, STRING to DATETIME. As well as in-house types with special behavior that can't be put into Beam. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160717) Time Spent: 1.5h (was: 1h 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160719 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:35 Start Date: 30/Oct/18 16:35 Worklog Time Spent: 10m Work Description: kanterov edited a comment on issue #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434375201 @kennknowles thanks! I'm thinking about getting rid of enumerations, and use providers with registry instead, something like: ``` interface Provider { Optional get(ConversionRegistry registry, FieldType inputType, FieldType outputType); } ``` Motivation is to be able to compose "custom" conversions, for instance, STRING to DATETIME. As well as in-house types with special behavior that can't be put into Beam. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160719) Time Spent: 1h 40m (was: 1.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160714 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:30 Start Date: 30/Oct/18 16:30 Worklog Time Spent: 10m Work Description: kanterov commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229382903 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only supports
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160713 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:28 Start Date: 30/Oct/18 16:28 Worklog Time Spent: 10m Work Description: kanterov commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229382088 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only supports
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160706&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160706 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:20 Start Date: 30/Oct/18 16:20 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229376438 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only suppor
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160704 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:20 Start Date: 30/Oct/18 16:20 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229377635 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only suppor
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160707 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:20 Start Date: 30/Oct/18 16:20 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229378175 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only suppor
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160705 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:20 Start Date: 30/Oct/18 16:20 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#discussion_r229376846 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Cast.java ## @@ -0,0 +1,582 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static org.apache.beam.sdk.schemas.Schema.TypeName.ARRAY; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT16; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT32; +import static org.apache.beam.sdk.schemas.Schema.TypeName.INT64; +import static org.apache.beam.sdk.schemas.Schema.TypeName.MAP; +import static org.apache.beam.sdk.schemas.Schema.TypeName.ROW; + +import com.google.auto.value.AutoValue; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Joiner; +import com.google.common.collect.ImmutableList; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Maps; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.TypeName; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** Set of utilities for casting rows between schemas. */ +@Experimental(Experimental.Kind.SCHEMAS) +@AutoValue +public abstract class Cast extends PTransform, PCollection> { + + public abstract Schema outputSchema(); + + public abstract Nullability nullability(); + + public abstract Type type(); + + public abstract Shape shape(); + + /** Builder for {@link Cast}. */ + @AutoValue.Builder + public abstract static class Builder { + +public abstract Builder outputSchema(Schema schema); + +public abstract Builder nullability(Nullability nullability); + +public abstract Builder type(Type type); + +public abstract Builder shape(Shape shape); + +public abstract Cast build(); + } + + public static Builder builder() { +return new AutoValue_Cast.Builder(); + } + + public static Cast to(Schema outputSchema) { +return Cast.builder() +.outputSchema(outputSchema) +.nullability(Nullability.IGNORE) +.type(Type.WIDEN) +.shape(Shape.PROJECTION) +.build(); + } + + public List compatibility(Schema inputSchema) { +return Inference.compatibility(inputSchema, outputSchema(), nullability(), type(), shape()); + } + + public void verifyCompatibility(Schema inputSchema) { +List errors = compatibility(inputSchema); + +if (!errors.isEmpty()) { + String reason = + errors + .stream() + .map(x -> Joiner.on('.').join(x.path()) + ": " + x.message()) + .collect(Collectors.joining("\n\t")); + + throw new IllegalArgumentException("Cast isn't compatible:\n\t" + reason); +} + } + + @Override + public PCollection expand(PCollection input) { +Schema inputSchema = input.getSchema(); + +verifyCompatibility(inputSchema); + +return input +.apply( +ParDo.of( +new DoFn() { + // TODO: This should be the same as resolved so that Beam knows which fields + // are being accessed. Currently Beam only suppor
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160703&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160703 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:15 Start Date: 30/Oct/18 16:15 Worklog Time Spent: 10m Work Description: kanterov commented on issue #6888: [BEAM-5918] [WIP] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434366781 I want to redo part of resolving how to convert schemas. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160703) Time Spent: 40m (was: 0.5h) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160695&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160695 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:06 Start Date: 30/Oct/18 16:06 Worklog Time Spent: 10m Work Description: kanterov commented on issue #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434363483 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160695) Time Spent: 20m (was: 10m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160696&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160696 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 16:07 Start Date: 30/Oct/18 16:07 Worklog Time Spent: 10m Work Description: kanterov removed a comment on issue #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888#issuecomment-434363483 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 160696) Time Spent: 0.5h (was: 20m) > Add Cast transform for Rows > --- > > Key: BEAM-5918 > URL: https://issues.apache.org/jira/browse/BEAM-5918 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Gleb Kanterov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > There is a need for a generic transform that given two Row schemas will > convert rows between them. There must be a possibility to opt-out from > certain kind of conversions, for instance, converting ints to shorts can > cause overflow. Another example, a schema could have a nullable field, but > never have NULL value in practice, because it was filtered out. > What is needed: > - widening values (e.g., int -> long) > - narrowwing (e.g., int -> short) > - runtime check for overflow while narrowing > - ignoring nullability (nullable=true -> nullable=false) > - weakening nullability (nullable=false -> nullable=true) > - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5918) Add Cast transform for Rows
[ https://issues.apache.org/jira/browse/BEAM-5918?focusedWorklogId=160685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-160685 ] ASF GitHub Bot logged work on BEAM-5918: Author: ASF GitHub Bot Created on: 30/Oct/18 15:47 Start Date: 30/Oct/18 15:47 Worklog Time Spent: 10m Work Description: kanterov opened a new pull request #6888: [BEAM-5918] Add Cast transform for Rows URL: https://github.com/apache/beam/pull/6888 Casts rows from one schema, into another. Implements: - widening values (e.g., int -> long), to be extended with more conversions - narrowwing (e.g., int -> short), to be extended with more conversions - ignoring nullability (nullable=true -> nullable=false) - weakening nullability (nullable=false -> nullable=true) - projection (Schema(a: Int32, b: Int32) -> Schema(a: Int32)) It would be very useful for Row-based IO-s, for instance, BeamBigQueryTable can be implemented with org.apache.beam.sdk.schemas.utils.AvroUtils and Cast, and this will make it more flexible, now it's very restrictive to the schema. Another example is reading AVRO GenericRecord as user-provided POJO, [BEAM-5807](https://issues.apache.org/jira/browse/BEAM-5807). I want to get an initial port of feedback before polishing Javadoc, API, etc. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.a