I've been working on a new processor that does Change Data Capture with 
Microsoft SQL Server. I followed Microsoft's documentation on how CDC works, 
and I've got some code that gets me the changes and is testing well. Right now, 
I don't actually have a processor, but a number of scripts that generate SQL 
and I put it into ExecuteSQL and QueryDatabaseTable processors; with QDB using 
my as-yet incomplete NIFI-1706<https://github.com/apache/nifi/pull/2162>.

One of the reasons I don't have a processor yet is because I don't want to use 
the same output format as the MySQL CDC Processor, but I didn't want to put in 
the time if it was not going to get merged. The MySQL CDC processor uses JSON 
messages as the output format, but in MS SQL the CDC messages are rows in a 
table; and it's much more convenient to output them as records. Currently, I'm 
using Avro.

Questions:

  *   My output format doesn't have to be Avro, but given the source is rows in 
a table being returned by a ResultSet, using the JdbcCommon class makes a lot 
of sense to me. Can I move JdbcCommon to somewhere useful like 
nifi-avro-record-utils?
  *   I'll be looping through a list of tables and plan on committing the files 
immediately to the success relationship as that table's CDC records are pulled. 
I want to make sure that the max value tracking gets updated immediately too. 
Does calling setState on the State Manager cause an immediate state save? Is 
this safe to call repeatedly, assuming single threaded, during the execution of 
the processor?
  *   Concerns with using a different output format than the MySQL CDC 
Processor?

Thanks,
  Peter

Reply via email to