You don't need my permission. :)  I'm totally fine with doing it later, I just 
wanted to make sure we were keeping it in mind.

Alan.

On Nov 5, 2012, at 10:08 AM, Mithun Radhakrishnan wrote:

> Hello, Alan.
> 
> Agreed. I'd like to refactor NotificationListener for that express reason, 
> but with your permission, I'll do so as a follow-up JIRA, very soon.
> 
> Mithun
> 
> 
> ________________________________
> From: Alan Gates <[email protected]>
> To: [email protected] 
> Sent: Monday, November 5, 2012 8:27 AM
> Subject: Re: Add/delete-partition JMS message format proposal.
> 
> Looks good.  I definitely agree with shrinking the message size.  We can keep 
> this to a notification and let client go to the metastore to get the 
> information it cares about.  
> 
> One comment I would make is we should consider that in time we would like to 
> move this away from just sending messages via JMS to sending them via other 
> messaging protocols as well (HTTP, Kafka, etc.)  So we don't want to do 
> anything that binds this more tightly to JMS or ActiveMQ.  I don't see 
> anything in these changes that do, but I think it's good to call that out as 
> a design goal.
> 
> Alan.
> 
> On Oct 30, 2012, at 2:26 PM, Mithun Radhakrishnan wrote:
> 
>> Hello, HCat-Dev.
>> 
>> I'm working on modifying the HCat messages (sent over JMS/ActiveMQ, for 
>> partition-add/delete) so that clients (such as
>> Oozie) would have an easier time with consumption.
>> Here are some limitations of what's available currently:
>> 1. The present implementation in HCatalog (branch-0.4/) seems to send the 
>> entire Partition (Java) instance in serialized fashion. Since the 
>> partition-parameters, hdfs-location etc. are all serialized, the messages 
>> are rather, emm, garrulous.
>> 2. There doesn't seem to be any support for versioning either. So when new 
>> fields are added, older clients won't work at all without update.
>> 
>> Could we consider transmitting only that info which identifies the 
>> partitions that pertain to the operation (e.g. partition keys), and drop any 
>> information that might be gathered from querying the metadata (e.g. storage 
>> location, partition-parameters, etc.)
>> 
>> We're also considering that the initial implementation encode the ActiveMQ 
>> payload in JSON.  Here's an example of the proposed message format for an 
>> "add_partition" operation:
>> 
>> "add_partition": {
>>    "hcat_server" : "thrift://my.hcat.server:9080",
>>    "hcat_service_principal" : "hcat/[email protected]",
>>    "db": "default",
>>    "table": "starling_jobs",
>>    "partitions":
>>      [
>>        {"grid": "AxoniteBlue", "dt": "2012_10_25"},// Sets of partition-keys.
>>        {"grid": "AxoniteBlue", "dt": "2012_10_26"},
>>        {"grid": "AxoniteBlue", "dt": "2012_10_27"},
>>        {"grid": "AxoniteBlue", "dt": "2012_10_28"},
>>      ],
>>    "timestamp": "1351534729" // In this case, interpreted as creation-time.
>> }
>> 
>> If we continue to use JMS MapMessages, we could consider having 3 keys in 
>> the map:
>> 1. version = "1" (for the first implementation. Increment as we go.)
>> 2. format = "json" (We could consider adding different formats if we choose.)
>> 3. message = <the json message body, as above.>
>> 
>> The version and format help a factory choose the right implementation to 
>> deserialize the message. (A client-side library we supply to Oozie should 
>> hide this and provide POJOs.)
>> 
>> Since the "partitions" field is an array, and since the values corresponding 
>> to partition-keys are all strings, we'd be able to accommodate partial 
>> partitions-specs, or even wild-cards. This might help us add support for 
>> "mark-set-done" later on.
>> 
>> The first key ("add_partition", "drop_partition" or "alter_partition") 
>> indicates the operation, and the value indicates the record-body. (At first 
>> glance, the record-body doesn't change for these operations. But that might 
>> change, so we'll keep them distinct.)
>> 
>> Also note that HiveMetaStore::add_partitions_core() currently doesn't send 1 
>> message for the entire set of partitions being added. Instead we get one 
>> message per partition. This could be verbose and sub-optimal. We'll tackle 
>> this sort of thing after we've nailed the format down.
>> 
>> I'm toying with the idea of adding an "other" property, an array of 
>> key-values to accommodate stuff we hadn't considered, at "run-time" (like if 
>> we want to introduce a hack). The need for such a property is contingent on 
>> the behaviour of Jackson w.r.t. newly added properties in the record-body. 
>> (I'll run experiments and keep you posted.)
>> 
>> What do you think?
>> 
>> Mithun

Reply via email to