[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution
n3nash commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-841601332 @vinothchandar We have found a way to avoid `BaseAvroPayloadWithSchema`. We are validating some more corner cases with respect to schema evolution in the PR @nsivabalan has pointed out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution
n3nash commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-835486764 At a high level, my understanding is there are 3 issues 1. Removing namespace could cause backwards compatibility -> Validated with [this](https://github.com/apache/hudi/pull/2928) PR that it should be OK to make this change. 2. Records with older schema received without a user defined schema. This breaks the `getInsertRecord` API since the writer schema is smaller than the reader schema and bytes to generic record conversion breaks, addressed by @nsivabalan [diff](https://github.com/apache/hudi/pull/2927) 3. For general schema evolution to work when converting from DataFrame to Avro, we need to ensure that new elements added in the schema have a default value "null" using the UNION type as follows `name: a, type ["null", ]`. Spark's internal converter breaks the Avro spec of creating a UNION by reversing the order of null as follows `name: a, type [, "null"]` , @nsivabalan fixed this with [this](https://github.com/apache/hudi/pull/2765) PR by relying on Hudi's internal ConversionHelper to ensure that "null" is appended first. With all of these 3 fixes, we should be able to land this PR. @sathyaprakashg Let us know if there were any other concerns that you came across. @nsivabalan I can confirm that (3) should be good to land. I've approved the diff and @bvaradar will be taking a look at this later today, once he confirms, we can land both (3) and then (2) with the assumption that (1) validates both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution
n3nash commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-809974270 @sathyaprakashg Can you rebase this PR ? I'll wait for sometime before I'll give it a shot by myself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution
n3nash commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-693155102 @sathyaprakashg thanks for the explanation, please update the PR accordingly, rebase and squash the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution
n3nash commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-688954187 @sathyaprakashg Thanks for looking into this. I see that the `org.apache.spark.sql.avro.SchemaConverters` uses the `fixed` name so it's difficult to workaround it. Your approach sounds fine to me as long as it does not break any existing tests. Left a couple of comments for changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #2012: HUDI-1129 Deltastreamer Add support for schema evolution
n3nash commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-685269655 @sathyaprakashg Looks good to me. Can you please see why the build is failing ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org