[
https://issues.apache.org/jira/browse/HCATALOG-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493694#comment-13493694
]
Mithun Radhakrishnan commented on HCATALOG-546:
-----------------------------------------------
Hello, Travis. Thanks for the review. (I've updated the patch.)
1. Again, thanks for doing the measurements. The concern isn't so much that
they're too large now, than that they could be smaller. Given the volume of
events we expect to be consuming in Oozie, we're expecting that overall gains
from removing anything that's redundant.
2. A couple of thoughts about thrift:
a. We'd like to use/perpetuate as little of the thrift-struct definitions in
our interfaces as viable. At some point, I expect that the thrift bits will be
replaced. (webhcat comes to mind.)
b. We suspect that (language-bindings-wise,) the consumption of JSON message
strings would be easier than using thrift.
c. We'd like to deliberately decouple notification-content from the contents
of the thrift-structs, not just because a lot of it is redundant/queryable, but
we'd thus have the liberty to introduce new content to an AlterPartitionEvent
that might not be held in the thrift Partition.
(You're right, though. The Partition might still deserialize correctly even if
the struct changes. I thought it wouldn't, initially.)
d. One option could have been to serialize the whole notification message in
thrift. JSON was just simpler.
I thought I'd mention that after this has stabilized, the next step would be to
introduce support for logical (and atomic) "sets" of partitions. Serializing
just the partition-key-vals instead of whole Partition instances would yield
savings, especially if the sets tend to be large. And then, if the messages are
going to be persisted (for say querying later), then any space-savings will
help.
Does that sound alright?
> Rework HCatalog's JMS Notifications
> ------------------------------------
>
> Key: HCATALOG-546
> URL: https://issues.apache.org/jira/browse/HCATALOG-546
> Project: HCatalog
> Issue Type: Bug
> Components: notification
> Affects Versions: 0.4.1
> Reporter: Mithun Radhakrishnan
> Assignee: Mithun Radhakrishnan
> Fix For: 0.4.1
>
> Attachments: HCATALOG-546.patch, sample.Add.Drop.Database.json,
> sample.Add.Drop.Partition.json, sample.Add.Drop.Table.json
>
>
> In 0.4.1, the NotificationListener listens for metastore operations and emits
> JMS notifications containing the entire metastore-objects
> (Database/Table/Partitions) in Java-serialized form. The assumption at the
> time was that consumers might need access to the whole object. This policy
> poses a couple of problems:
> 1. The notifications are verbose, since it conveys a bunch of information
> that's available from querying the metastore anyway.
> 2. Consumers of these JMS notifications (e.g. Oozie) would now be dependent
> on the Java class definitions of metastore-objects. If they change, Oozie
> would also need to be restarted (with updated libs), to consume the
> notifications.
> Ideally, the notifications should convey only the minimum information that
> identifies the metastore-change unambiguously. (Everything else can be
> queried for.) They should be backward compatible. If new fields are added,
> existing consumers shouldn't break (unless they intend to consume the new
> fields). Also, the notification-format ought to be pluggable.
> For the initial rework, we're proposing to use a JSON-string to represent the
> notification-content. We're also proposing a helper-class for the likes of
> Oozie to use, that converts the strings to POJOs, in a backward-compatible
> fashion.
> I'll attach sample notifications and a tentative patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira