[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException
the-other-tim-brown commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1550022043 @Sam-Serpoosh We need to look to the source of the data. How are you running Debezium and what types of configurations do you have there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException
the-other-tim-brown commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1548675221 @Sam-Serpoosh let's clarify a few things. The avro schema is valid and yes hudi can handle it. My question is simply why are you having this issue when tens of other users/companies including myself are able to get debezium topics without the extra level of nesting. You can see in the link I posted previously that there is an expected schema coming from debezium but your data is not ending up in that format. How are you deploying debezium and what are your configs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException
the-other-tim-brown commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1544025115 @Sam-Serpoosh nice find! This is not what we've seen in practice when using debezium. You could work around this but let's see if we can get rid of this nesting so the data is easier to work with. I don't see anything in the debezium docs about this https://debezium.io/documentation/reference/stable/connectors/postgresql.html Can you let me know which version of debezium and postgres you are running? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException
the-other-tim-brown commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1542645730 @Sam-Serpoosh for the first issue regarding the schema, this is because we are fetching that schema as a string. If that class is not defined in the string, we won't know how it is defined. Maybe there is some arg to pass to the api to get the schemas that this schema relies on as well? For the second, it is hard to tell without looking at your data. If you pull the data locally and step through, you may have a better shot of understanding. The main thing I have seen trip people up is the requirements for the delete records in the topic. You can also try out the same patch Sydney posted above for filtering out the tombstones in kafka. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException
the-other-tim-brown commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1530683969 @samserpoosh you can confirm whether this is the case by deserializing the data with `org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer` without the extra postgres debezium logic. I am unfamiliar with the settings that come with that so it could be leading to issues parsing the records. Do you have a stacktrace? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException
the-other-tim-brown commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1530680623 > Thank you @the-other-tim-brown for the explanation/confirmation. That is what we have assumed as well, but it seems with our configuration we are unable to parse the event with `before.*` and `op: "d"`, seems Deltastreamer just sees it as an error. > > We are using COW table, Upsert mode if that makes a difference. @sydneyhoran that shouldn't make a difference. Can you expand your initial stacktrace to show what the NPE was caused by including the method that threw it. Can you also confirm that the `before` field in your kafka topic is set for all records? Without it the debezium logic in Hudi will break. Did you get any stacktrace for the errors? It could be due to a schema incompatibility issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException
the-other-tim-brown commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1526004722 @sydneyhoran I'm still trying to come up to speed on the errors you are seeing but I can chime in on the behavior for the PostgresDebeziumSource and the payload. The [source](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/debezium/PostgresDebeziumSource.java#L79) is going to be doing something similar to `before.*` or `after.*` along with pulling out some metadata from the debezium record. The payload will be marking the row for deletion if the `op` is `d`. In order to properly delete the record, the `before` field needs to be set for deletions so you can extract the proper `id`, `inserted_at`, and `updated_at` values so Hudi knows which record to delete, which partition it is in, and whether it is the latest update for that record. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org