Martin Kleppmann created SAMZA-200:
--------------------------------------

             Summary: Explore using MySQL changelog as input stream
                 Key: SAMZA-200
                 URL: https://issues.apache.org/jira/browse/SAMZA-200
             Project: Samza
          Issue Type: New Feature
            Reporter: Martin Kleppmann
            Assignee: Martin Kleppmann


Samza is designed with good support for database changelogs, but the current 
open source release is mostly centered around Kafka. It would be good to have 
out-of-the-box support for some common databases, such as MySQL, as well.

[Databus|http://www.socc2012.org/s18-das.pdf?attredirects=0] is LinkedIn's 
change capture tool, but the current open source release focuses mainly on 
Oracle. There is an open source release of [Databus for 
MySQL|https://github.com/linkedin/databus/wiki/Databus-for-MySQL], but it's a 
proof-of-concept implementation, not the one used by LinkedIn in production. 
(The one used by LinkedIn requires a patched version of MySQL.) The open source 
Databus uses [Open Replicator|https://code.google.com/p/open-replicator/] to 
connect to a MySQL server as a slave, and parses the binlog to find any 
inserts, updates or deletes.

I played around a bit with Open Replicator today, and got it working — a small 
Scala program that could get a real-time feed of all changes happening in a 
MySQL database. However, I have some doubts about the quality of the library 
(the code is not very good, it has only very cursory tests, the original 
maintainer hasn't touched it for 18 months, and there are reports of nasty bugs 
-- eg. blowing up on any negative number). There don't seem to be any better 
Java binlog parsers out there. But I did skim the source of Open Replicator, 
and it's not too complicated -- it seems quite feasible to write a MySQL binlog 
parser ourselves.

This is still very much at exploratory stage, but I think it could be really 
cool to have database changelog support easily available in Samza.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to