Sasha,

1) "- Rather then streaming huge snapshots in a single message we should
>   provide streaming protocol with smaller messages and later reassembly on
>   the HDFS side."
Based on https://thrift.apache.org/docs/concepts, Thrift transport can be raw 
TCP or HTTP. HTTP is above TCP. TCP will cut application stream into blocks 
that can fit into IP packets, which can fit into link layer frames. So 
application (such as Sentry) does not need to handle such low level processing, 
such as cutting stream into small messages and then reassemble into stream 
again. What is the reason you want to do that? Did you see performance issue? 
We can capture packets on the wire and see the exact protocol stack of Thrift, 
and decide if we want to change configuration to improve performance.


2) "  - Most of the information passed are long strings with common prefixes.
> 
>   We should be able to apply simple compression techniques (e.g. prefix
>   compression) or even run a full compression on the data before sending."
Bas d on http://thrift-tutorial.readthedocs.io/en/latest/thrift-stack.html, 
Thrift supports compression. We can configure its protocol as TDenseProtocol or 
TCompactProtocol.

3) "  - We should consider using non-thrift data structures for passing the
> 
>   info and just use Thrift as a transport mechanism."
What is the reason you want to make this change? 
Based on 
https://stackoverflow.com/questions/9732381/why-thrift-why-not-http-rpcjsongzip,
 Thrift has several benefits.

Thanks,

Lina

Sent from my iPhone

> On Jun 27, 2017, at 5:44 PM, Alexander Kolbasov <ak...@cloudera.com> wrote:
> 
> Some food for thought.
> 
> Currently Sentry uses serialized Thrift structures to send a lot of
> information from the Sentry Server to the HDFS namenode plugin for the HDFS
> sync.
> 
> We should think of ways to optimize this protocol in several ways:
> 
> 
>   - Rather then streaming huge snapshots in a single message we should
>   provide streaming protocol with smaller messages and later reassembly on
>   the HDFS side.
>   - Most of the information passed are long strings with common prefixes.
>   We should be able to apply simple compression techniques (e.g. prefix
>   compression) or even run a full compression on the data before sending.
>   - We should consider using non-thrift data structures for passing the
>   info and just use Thrift as a transport mechanism.
> 
> - Sasha

Reply via email to