cdbartholomew opened a new pull request #6848:
URL: https://github.com/apache/pulsar/pull/6848


   ### Motivation
   
   JDBC sink does not handle `null` fields. For example, the field `example` 
can potentially be null. The schema registered in Pulsar allows for it, and the 
table schema in MySQL has a column of the same name, is configured as double 
and also allows nulls. When messages are sent to the JDBC sink without that 
field, an exception like this is seen:
   
   ```
   21:08:38.472 [pool-5-thread-1] ERROR 
org.apache.pulsar.io.jdbc.JdbcAbstractSink - Got exception 
   java.sql.SQLException: Data truncated for column 'example' at row 1
        at 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:127) 
~[mysql-connector-java-8.0.11.jar:8.0.11]
        at 
com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:95) 
~[mysql-connector-java-8.0.11.jar:8.0.11]
        at 
com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
 ~[mysql-connector-java-8.0.11.jar:8.0.11]
        at 
com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:960)
 ~[mysql-connector-java-8.0.11.jar:8.0.11]
        at 
com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:388)
 ~[mysql-connector-java-8.0.11.jar:8.0.11]
        at 
org.apache.pulsar.io.jdbc.JdbcAbstractSink.flush(JdbcAbstractSink.java:202) 
~[pulsar-io-jdbc-2.5.0.nar-unpacked/:?]
        at 
org.apache.pulsar.io.jdbc.JdbcAbstractSink.lambda$open$0(JdbcAbstractSink.java:108)
 ~[pulsar-io-jdbc-2.5.0.nar-unpacked/:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_232]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_232]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_232]
   ```
   Looking at the code for the JDBC sink, there was no handling of the case 
where the field was `null`. The PR adds code to handle that case. It also adds 
unit tests to cover this for both binary and JSON encoding of the schema.
   
   ### Modifications
   
   When the sink encounters a `null` field value it uses the `setColumnNull` 
method to properly reflect this in the database row.
   
   ### Verifying this change
   
   - [X ] Make sure that the change passes the CI checks.
   
   This change added tests and can be verified as follows:
     - *Run unit test that sends null values and validates that they are 
properly set to null by doing a SQL query on the resulting table*
     - *Run unit test that cover JSON encoding since `null`s are handled 
differently depending on the encoding*
   
   In addition, this change has been verified by installing the new NAR into an 
existing cluster, sending messages with `null` values to a MySQL database, and 
confirming that the resulting rows are properly represented.
   
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API: (no)
     - The schema: (no)
     - The default values of configurations: (no)
     - The wire protocol: (no)
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (no)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to