the-other-tim-brown opened a new pull request, #6761:
URL: https://github.com/apache/hudi/pull/6761

   ### Change Logs
   
   If a user provides a recursive proto schema, it will fail when we write to 
parquet. We need to allow the user to specify how many levels of recursion they 
want before truncating the remaining data. 
   
   Main changes to existing code:
    - ProtoClassBasedSchemaProvider tracks number of times a message descriptor 
is seen within a branch of the schema traversal
    - once the number of times that descriptor is seen exceeds the user 
provided limit, set the field to preset record that will contain two fields: 1) 
the remaining data serialized as a proto byte array, 2) the descriptors full 
name for context about what is in that byte array
    - Converting from a proto to an avro now accounts for this truncation of 
the input
   
   ### Impact
   
   As part of this change, I needed to change how the namespace was set for the 
Records within the Avro schema. Since we cannot repeat the exact same namespace 
+ name, I made the namespace the path within the schema being traversed so each 
instance of a recursive message class will have a unique full name.
   
   Marking this as low risk since the protobuf support is scheduled for the 
0.13.0 release
   
   **Risk level: low **
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to