I've been working on a new processor that does Change Data Capture with Microsoft SQL Server. I followed Microsoft's documentation on how CDC works, and I've got some code that gets me the changes and is testing well. Right now, I don't actually have a processor, but a number of scripts that generate SQL and I put it into ExecuteSQL and QueryDatabaseTable processors; with QDB using my as-yet incomplete NIFI-1706<https://github.com/apache/nifi/pull/2162>.
One of the reasons I don't have a processor yet is because I don't want to use the same output format as the MySQL CDC Processor, but I didn't want to put in the time if it was not going to get merged. The MySQL CDC processor uses JSON messages as the output format, but in MS SQL the CDC messages are rows in a table; and it's much more convenient to output them as records. Currently, I'm using Avro. Questions: * My output format doesn't have to be Avro, but given the source is rows in a table being returned by a ResultSet, using the JdbcCommon class makes a lot of sense to me. Can I move JdbcCommon to somewhere useful like nifi-avro-record-utils? * I'll be looping through a list of tables and plan on committing the files immediately to the success relationship as that table's CDC records are pulled. I want to make sure that the max value tracking gets updated immediately too. Does calling setState on the State Manager cause an immediate state save? Is this safe to call repeatedly, assuming single threaded, during the execution of the processor? * Concerns with using a different output format than the MySQL CDC Processor? Thanks, Peter