[ 
https://issues.apache.org/jira/browse/ATLAS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162871#comment-15162871
 ] 

Hemanth Yamijala commented on ATLAS-512:
----------------------------------------

There are a couple of ways of doing this. I am trying to list the approaches 
here with their pros and cons, and get feedback:

*Option 1: Move model registration out of the hooks and into an independent 
tool / script*
* Mechanics:
** Every integrating component will provide an implementation that will have 
the serialized model and a 'signature type'. A utility in Atlas will take these 
and call the create type API.
** This utility will encode the current logic of checking for a type before 
registering. The 'signature type' is used for this purpose
** This utility should essentially be called before any entity creation happens 
from the hooks - so really like a setup step for Atlas. There are a couple of 
ways of doing this as well, I guess.
* Pros:
** The chief advantage is that we can make the model registration a one time 
activity done in a controlled environment.
** Because we are using an API, we can get feedback on the success or failure 
of the model registration and have the chance of acting on it.
** Speaking in the shorter term, this is fewer changes to the Atlas system. 
Mainly, it can be setup to not touch Atlas server side at all.
* Cons:
** Depending on implementation, there is a possibility that this registration 
does not happen (mostly due to human error) and entity creates/updates could 
fail due that.
** Has a dependency that the Atlas server is running for this to work. (In 
defense, this is a one-time activity)

*Option 2: Write type creations through Kafka from the hooks, instead of the 
API*
* Mechanics:
** A hook will send the type creations as notifications to the ATLAS_HOOK topic 
of Atlas.
** The hook will not check if the type is already registered. This implies 
client side hooks like Storm would write this multiple times (unless they 
maintain state independently)
** The Hook consumer of Atlas should be extended to process this new type of 
message (like TYPE_CREATE - which is already there). The consumer should check 
if the type is already registered, and if yes, not act on the type. A log would 
be helpful to audit such an event. Else, it calls the create type API.
* Pros:
** This retains the spirit of hooks auto-registering models. Hence, the chance 
for errors is minimized
** We remove dependency on the Atlas server for everything
* Cons:
** We cannot give feedback on type registration in this mechanism.
** Since client side hooks (which don't maintain state) can write types 
multiple times, there could be some load on Kafka (for things like Hive CLI, 
this could be non-trivial impact, but certainly not something that Kafka cannot 
handle). It just feels wasteful to do so.
** This change is more intrusive as the Atlas server will need some 
modifications for this to work.

Personally, moving to Kafka seems like it will work ok chiefly because it 
retains the current spirit of auto-registration and does not introduce any 
setup step that could get error prone for users managing Atlas. The only 
concern is with client components writing the type definitions multiple times. 
If this gets to be really an issue, we may need the clients to maintain state 
in some persistent state store.

Thoughts from others? 


> Decouple currently integrating components  from availability of Atlas service 
> for raising metadata events
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-512
>                 URL: https://issues.apache.org/jira/browse/ATLAS-512
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>
> The components that currently integrate with Atlas (Hive, Sqoop, Falcon, 
> Storm) all communicate their metadata events using Kafka as a messaging 
> layer. This effectively decouples these components from the Atlas server. 
> However, all of these components have some initialization that checks if 
> their respective models are registered with Atlas. For components that 
> integrate on the server, like HiveServer2 and Falcon, this initialization is 
> a one time check and hence, is manageable. Others like Sqoop, Storm and the 
> Hive CLI are client side components and hence the initialization happens for 
> every run or session of these components. Invoking the initialization (and 
> the one time check) every time like this effectively means that the Atlas 
> server should be always available.
> This JIRA is to try and remove this dependency and thus truly decouple these 
> components.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to