chinmay-032 commented on issue #8625: URL: https://github.com/apache/hudi/issues/8625#issuecomment-1545603221
#### New insights after discussing witn @ad1happy2go: The problem arises when using a dynamically created StructType schema. When a statically declared schema is used, the updates work fine. That is, something like: ``` json_schema = StructType([StructField('profile_id', StringType(), True), StructField('timestamp', TimestampType(), True), StructField('id', StringType(), True), StructField('Enjoy', BooleanType(), True), StructField('DOB', ArrayType(StringType(), False), True), StructField('zip', StringType(), True), StructField('country', StringType(), True), StructField('email_vendor', StringType(), True), StructField('city', StringType(), True), StructField('active_audience', DoubleType(), True), StructField('last_name', StringType(), True), StructField('migrated_from', StringType(), True), StructField('product_range', ArrayType(StringType(), False), True), StructField('email_sub', BooleanType(), True), StructField('sms_vendor', StringType(), True), StructField('audience_count', DoubleType(), True), StructField('whatsapp_vendor', StringType(), True), StructField('first_name', StringType(), True)]) . . <rest of program> ``` works as expected. However when using a schema dynamically fetched using an internal service: ``` def get_structfield(fieldname, typestring): if typestring == "NUMBER": return StructField(fieldname, DoubleType(), True) elif typestring == "BOOLEAN": return StructField(fieldname, BooleanType(), True) elif typestring == "RELATIVE_TIME": return StructField(fieldname, TimestampType(), True) elif typestring == "LIST_DOUBLE": return StructField(fieldname, ArrayType(DoubleType(), False), True) elif typestring == "LIST_STRING": return StructField(fieldname, ArrayType(StringType(), False), True) return StructField(fieldname, StringType(), True) def get_schema(shop_id): url = "my-url" params = { ## my-params } response = requests.get(url, params=params) if response.status_code == 200: response_json = response.json() schemaDict = response_json["data"] LOGGER.info(f"Table profile_{shop_id}: Fetch Schema Successful") schema = StructType() for key in schemaDict: schema.add(get_structfield(key, schemaDict[key])) return schema else: LOGGER.info(f"Table profile_{shop_id}: Fetch Schema Failed") raise Exception("Could not get schema") ``` does not update properly. As our use case heavily depends upon getting a schema dynamically, we cannot adopt the static approach and are trying to get a solution for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org