[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

2021-05-14 Thread GitBox


n3nash commented on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-841601332


   @vinothchandar We have found a way to avoid `BaseAvroPayloadWithSchema`. We 
are validating some more corner cases with respect to schema evolution in the 
PR @nsivabalan has pointed out. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

2021-05-08 Thread GitBox


n3nash commented on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-835486764


   At a high level, my understanding is there are 3 issues
   
   1. Removing namespace could cause backwards compatibility -> Validated with 
[this](https://github.com/apache/hudi/pull/2928) PR that it should be OK to 
make this change. 
   2. Records with older schema received without a user defined schema. This 
breaks the `getInsertRecord` API since the writer schema is smaller than the 
reader schema and bytes to generic record conversion breaks, addressed by 
@nsivabalan [diff](https://github.com/apache/hudi/pull/2927)
   3. For general schema evolution to work when converting from DataFrame to 
Avro, we need to ensure that new elements added in the schema have a default 
value "null" using the UNION type as follows `name: a, type ["null", ]`. 
Spark's internal converter breaks the Avro spec of creating a UNION by 
reversing the order of null as follows `name: a, type [, "null"]` , 
@nsivabalan fixed this with [this](https://github.com/apache/hudi/pull/2765) PR 
by relying on Hudi's internal ConversionHelper to ensure that "null" is 
appended first. 
   
   With all of these 3 fixes, we should be able to land this PR. 
@sathyaprakashg Let us know if there were any other concerns that you came 
across. 
   
   @nsivabalan I can confirm that (3) should be good to land. I've approved the 
diff and @bvaradar will be taking a look at this later today, once he confirms, 
we can land both (3) and then (2) with the assumption that (1) validates both. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

2021-03-30 Thread GitBox


n3nash commented on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-809974270


   @sathyaprakashg Can you rebase this PR ? I'll wait for sometime before I'll 
give it a shot by myself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

2020-09-15 Thread GitBox


n3nash commented on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-693155102


   @sathyaprakashg thanks for the explanation, please update the PR 
accordingly, rebase and squash the commits.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

2020-09-08 Thread GitBox


n3nash commented on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-688954187


   @sathyaprakashg Thanks for looking into this. I see that the 
`org.apache.spark.sql.avro.SchemaConverters` uses the `fixed` name so it's 
difficult to workaround it. Your approach sounds fine to me as long as it does 
not break any existing tests.  Left a couple of comments for changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2012: HUDI-1129 Deltastreamer Add support for schema evolution

2020-09-01 Thread GitBox


n3nash commented on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-685269655


   @sathyaprakashg Looks good to me. Can you please see why the build is 
failing ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org