chinmay-032 commented on issue #8625:
URL: https://github.com/apache/hudi/issues/8625#issuecomment-1545603221

   #### New insights after discussing witn @ad1happy2go:
   The problem arises when using a dynamically created StructType schema. When 
a statically declared schema is used, the updates work fine. That is, something 
like: 
   
   ```
   json_schema = StructType([StructField('profile_id', StringType(), True), 
StructField('timestamp', TimestampType(), True), StructField('id', 
StringType(), True), StructField('Enjoy', BooleanType(), True), 
StructField('DOB', ArrayType(StringType(), False), True), StructField('zip', 
StringType(), True), StructField('country', StringType(), True), 
StructField('email_vendor', StringType(), True), StructField('city', 
StringType(), True), StructField('active_audience', DoubleType(), True), 
StructField('last_name', StringType(), True), StructField('migrated_from', 
StringType(), True), StructField('product_range', ArrayType(StringType(), 
False), True), StructField('email_sub', BooleanType(), True), 
StructField('sms_vendor', StringType(), True), StructField('audience_count', 
DoubleType(), True), StructField('whatsapp_vendor', StringType(), True), 
StructField('first_name', StringType(), True)])
   
   . 
   .
   <rest of program>
   ```
   works as expected. However when using a schema dynamically fetched using an 
internal service: 
   ```
   def get_structfield(fieldname, typestring):
       if typestring == "NUMBER":
           return StructField(fieldname, DoubleType(), True)
       elif typestring == "BOOLEAN":
           return StructField(fieldname, BooleanType(), True)
       elif typestring == "RELATIVE_TIME":
           return StructField(fieldname, TimestampType(), True)
       elif typestring == "LIST_DOUBLE":
           return StructField(fieldname, ArrayType(DoubleType(), False), True)
       elif typestring == "LIST_STRING":
           return StructField(fieldname, ArrayType(StringType(), False), True)
           
       return StructField(fieldname, StringType(), True)
   
   def get_schema(shop_id):
       url = "my-url"
       params = {
          ## my-params
       }
       response = requests.get(url, params=params)
       if response.status_code == 200:
           response_json = response.json()
           schemaDict = response_json["data"]
           LOGGER.info(f"Table profile_{shop_id}: Fetch Schema Successful")
           schema = StructType()
           for key in schemaDict:
               schema.add(get_structfield(key, schemaDict[key]))
           return schema
       else:
           LOGGER.info(f"Table profile_{shop_id}: Fetch Schema Failed")
           raise Exception("Could not get schema")
   ```
   does not update properly. 
   As our use case heavily depends upon getting a schema dynamically, we cannot 
adopt the static approach and are trying to get a solution for that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to