Some food for thought. Currently Sentry uses serialized Thrift structures to send a lot of information from the Sentry Server to the HDFS namenode plugin for the HDFS sync.
We should think of ways to optimize this protocol in several ways: - Rather then streaming huge snapshots in a single message we should provide streaming protocol with smaller messages and later reassembly on the HDFS side. - Most of the information passed are long strings with common prefixes. We should be able to apply simple compression techniques (e.g. prefix compression) or even run a full compression on the data before sending. - We should consider using non-thrift data structures for passing the info and just use Thrift as a transport mechanism. - Sasha
