[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-03 Thread via GitHub


sydneyhoran commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1533514539

   The reason I was getting an error on deleting records without tombstone was 
because we were testing by starting from a midpoint of a Kafka topic, so I 
suspect Deltastreamer didn't know what to do with `"d"` messages for records 
that didn't exist. Had to add a few more logging lines and check executor logs 
to find out what was going on.
   
   We'll make sure jobs are starting fresh from the start of topics and turn on 
`commit-on-errors` if needed to get around non-existent deletes (until there is 
a possible fix for that).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-03 Thread via GitHub


sydneyhoran commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1533511853

   Thanks to help from Aditya, @rmahindra123  and @nsivabalan , this was the 
fix that worked for us to filter out tombstones: 
https://github.com/sydneyhoran/hudi/commit/b864a69e27d50424b6984f28a31c3bd99a025762


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-01 Thread via GitHub


sydneyhoran commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-152991

   Thank you @the-other-tim-brown for the explanation/confirmation. That is 
what we have assumed as well, but it seems with our configuration we are unable 
to parse the event with `before.*` and `op: "d"`, seems Deltastreamer just sees 
it as an error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-01 Thread via GitHub


sydneyhoran commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1529939285

   When we either turn off tombstones in Debezium, or filter them out in 
DebeziumSource.java, no null/tombstones are coming in (which is good). But we 
still get a "commit failed" and upon further inspection of the log I have found 
that before this error it says `Delta Sync found errors when writing. 
Errors/Total=14/9282`
   
   It seems that DeltaSync/WriteStatus.java is treating the deletes as 
"Errors". When I set `--commit-on-errors` to true, it allows the job to run but 
what happens to those "error" records? Shouldn't they be telling Deltastreamer 
to "delete" those records?
   
   ```
   23/05/01 13:49:26 ERROR org.apache.hudi.utilities.deltastreamer.DeltaSync: 
Delta Sync found errors when writing. Errors/Total=14/9282
   23/05/01 13:49:26 ERROR org.apache.hudi.utilities.deltastreamer.DeltaSync: 
Printing out the top 100 errors
   23/05/01 13:49:26 ERROR org.apache.hudi.utilities.deltastreamer.DeltaSync: 
Global error :
   23/05/01 13:49:26 ERROR org.apache.hudi.utilities.deltastreamer.DeltaSync: 
Global error :
   23/05/01 13:49:26 ERROR org.apache.hudi.utilities.deltastreamer.DeltaSync: 
Global error :
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-04-26 Thread via GitHub


sydneyhoran commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1524068006

   My team is trying to develop a Custom Transformer class that can skip over 
null (tombstone) records from PostgresDebezium Kafka Source to address this. We 
are attempting along the lines of:
   
   ```
   public class TombstoneTransformer implements Transformer {
   
 private static final Logger LOG = 
LoggerFactory.getLogger(TombstoneTransformer.class);
   
 @Override
 public Dataset apply(JavaSparkContext jsc, SparkSession sparkSession, 
Dataset rowDataset,
   TypedProperties properties) {
   LOG.info("TombstoneTransformer " + rowDataset);
   // // NullPointerException happens on the following line:
   // List rowList = rowDataset.collectAsList();
   // // NullPointerException happens on the following line:
   // newRowSet.collectAsList().stream().limit(50).forEach(x -> 
LOG.info(x.toString()));
   // // Later in DeltaSync, tombstone records appear to still be present 
and results in NullPointerException later
   // Dataset newRowSet = rowDataset.filter("_change_operation_type is 
not null");
   // // Later in DeltaSync, tombstone records appear to still be present 
and results in NullPointerException later
   Dataset newRowSet = rowDataset.filter(Objects::nonNull);
   return newRowSet;
 }
   }
   ```
   
   However, none of the attempts at filtering the rowDataset get rid of the 
NullPointerException later in the Deltastreamer ingestion. Moreso, many of the 
attempts to log/view the individual records in rowDataset result in 
NullPointerException . And so we are wondering if there is something earlier in 
the code (maybe the PostgresDebeziumSource.java that flattens messages?) that 
runs that could be allowing malformed Row objects to get passed to the Custom 
Transformer classes - that somehow is not allowing us to read/access the Rows 
and filter out the ones that are null (tombstone) records.
   Anyone that might have an idea for how to make this class work? Side note - 
we also tried SqlQueryBasedTransformer with “SELECT * FROM  a WHERE a.id 
is not null” and it also did not filter out Tombstones (still had NPE later 
during ingestion).
   
   Could someone explain what is happening with delete operations on 
`PostgresDebeziumSource` and `PostgresDebeziumAvroPayload` and why they 
potentially aren't being handled well? Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-04-26 Thread via GitHub


sydneyhoran commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1524065639

   Just as an update, we were able to set tombstones.on.delete to `false` in a 
lower environment and still got the following error after a delete op:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org