[
https://issues.apache.org/jira/browse/BEAM-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418335#comment-17418335
]
Brian Hulette commented on BEAM-12921:
--------------------------------------
I can repro this by copying the test you provided into SelectTest. The test
passes despite the fact that the assertion is incorrect.
I looked into why it's happening. It looks like PAssert eagerly encodes
elements using the actual PCollection's Coder. In this case that's
projectedOutput's coder, which has the correct schema. RowCoder does not check
for the schema mismatch, so it encodes the elements, which end up encoded
identically to the actual output.
We could fix this if RowCoder validated the Row's schema before encoding, but
I'm not sure if we want to incur that overhead (CC [~reuvenlax] in case you
have any ideas)
> PAssert ignore the Schema fields names for testing
> ----------------------------------------------------
>
> Key: BEAM-12921
> URL: https://issues.apache.org/jira/browse/BEAM-12921
> Project: Beam
> Issue Type: Bug
> Components: dsl-sql
> Reporter: Sanil Jain
> Priority: P2
>
> Found this bug while testing Select operator that FieldName gets ignored by
> Passert here, this code passes
> beam version 2.26.0.8
> ```
> {code:java}
> private static final Schema APP_SCHEMA = Schema.builder()
> .addInt32Field("appId")
> .addStringField("description")
> .addFloatField("rating")
> .build();
> @Test
> public void testProjectOperator(){
> PCollection<Row> projectedOutput =
> generateTestRow(pipeline).apply(Select.fieldNames("appId", "description"));
> // Modified schema with renamed field
> Schema modifiedSchema = Schema.builder()
> .addInt32Field("appId")
> .addStringField("randomName")// this should ideally break
> .build();
> PAssert.that(projectedOutput).containsInAnyOrder(
> Row.withSchema(modifiedSchema).addValues(-8, "Invalid").build(),
> Row.withSchema(modifiedSchema).addValues(0, "Invalid").build(),
> Row.withSchema(modifiedSchema).addValues(1, "Recruiter").build(),
> Row.withSchema(modifiedSchema).addValues(2, "Hirein").build(),
> Row.withSchema(modifiedSchema).addValues(1, "Workplace").build()
> );
> pipeline.run().waitUntilFinish();
> }
> public static PCollection<Row> generateTestRow(Pipeline pipeline) {
> // Create a concrete row with that type.
> return PBegin
> .in(pipeline)
> .apply(Create.of(
> Row.withSchema(APP_SCHEMA).addValues(-8, "Invalid", 0f).build(),
> Row.withSchema(APP_SCHEMA).addValues(0, "Invalid",
> -1.1f).build(),
> Row.withSchema(APP_SCHEMA).addValues(1, "Recruiter",
> 4.2f).build(),
> Row.withSchema(APP_SCHEMA).addValues(2, "Hirein", 3.5f).build(),
> Row.withSchema(APP_SCHEMA).addValues(1, "Workplace",
> 3f).build())
> .withCoder(RowCoder.of(APP_SCHEMA)));
> }{code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)