[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-16 Thread via GitHub


the-other-tim-brown commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1550022043

   @Sam-Serpoosh We need to look to the source of the data. How are you running 
Debezium and what types of configurations do you have there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-15 Thread via GitHub


the-other-tim-brown commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1548675221

   @Sam-Serpoosh let's clarify a few things. The avro schema is valid and yes 
hudi can handle it. My question is simply why are you having this issue when 
tens of other users/companies including myself are able to get debezium topics 
without the extra level of nesting. You can see in the link I posted previously 
that there is an expected schema coming from debezium but your data is not 
ending up in that format. How are you deploying debezium and what are your 
configs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-11 Thread via GitHub


the-other-tim-brown commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1544025115

   @Sam-Serpoosh nice find! This is not what we've seen in practice when using 
debezium. You could work around this but let's see if we can get rid of this 
nesting so the data is easier to work with. I don't see anything in the 
debezium docs about this 
https://debezium.io/documentation/reference/stable/connectors/postgresql.html
   
   Can you let me know which version of debezium and postgres you are running?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-10 Thread via GitHub


the-other-tim-brown commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1542645730

   @Sam-Serpoosh for the first issue regarding the schema, this is because we 
are fetching that schema as a string. If that class is not defined in the 
string, we won't know how it is defined. Maybe there is some arg to pass to the 
api to get the schemas that this schema relies on as well? 
   
   For the second, it is hard to tell without looking at your data. If you pull 
the data locally and step through, you may have a better shot of understanding. 
The main thing I have seen trip people up is the requirements for the delete 
records in the topic. You can also try out the same patch Sydney posted above 
for filtering out the tombstones in kafka.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-01 Thread via GitHub


the-other-tim-brown commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1530683969

   
   @samserpoosh you can confirm whether this is the case by deserializing the 
data with `org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer` without 
the extra postgres debezium logic. I am unfamiliar with the settings that come 
with that so it could be leading to issues parsing the records. Do you have a 
stacktrace?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-01 Thread via GitHub


the-other-tim-brown commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1530680623

   > Thank you @the-other-tim-brown for the explanation/confirmation. That is 
what we have assumed as well, but it seems with our configuration we are unable 
to parse the event with `before.*` and `op: "d"`, seems Deltastreamer just sees 
it as an error.
   > 
   > We are using COW table, Upsert mode if that makes a difference.
   
   @sydneyhoran that shouldn't make a difference. Can you expand your initial 
stacktrace to show what the NPE was caused by including the method that threw 
it. Can you also confirm that the `before` field in your kafka topic is set for 
all records? Without it the debezium logic in Hudi will break.
   
   Did you get any stacktrace for the errors? It could be due to a schema 
incompatibility issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-04-27 Thread via GitHub


the-other-tim-brown commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1526004722

   @sydneyhoran I'm still trying to come up to speed on the errors you are 
seeing but I can chime in on the behavior for the PostgresDebeziumSource and 
the payload. The 
[source](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/debezium/PostgresDebeziumSource.java#L79)
 is going to be doing something similar to `before.*` or `after.*` along with 
pulling out some metadata from the debezium record. The payload will be marking 
the row for deletion if the `op` is `d`. In order to properly delete the 
record, the `before` field needs to be set for deletions so you can extract the 
proper `id`, `inserted_at`, and `updated_at` values so Hudi knows which record 
to delete, which partition it is in, and whether it is the latest update for 
that record.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org