[ 
https://issues.apache.org/jira/browse/ATLAS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167236#comment-15167236
 ] 

Hemanth Yamijala commented on ATLAS-512:
----------------------------------------

A few more discussions happened today and so updating on latest thinking:

Regarding option 1:
It has been pointed out that it still retains the need for the Atlas server to 
be active for type registration. One potential way of managing this is to 
slightly modify the mechanics of Option 1.
* Suppose that we provide a facility to 'bootstrap' the Atlas server with a set 
of types. This can be done by simply dropping these bootstrap type definitions 
as JSON files in a well known directory. 
* The Atlas server can on startup read these and register the types, provided 
they are not already registered.
* System level and 'reserved types' like models representing the Hadoop 
components (Hive, Falcon, Sqoop etc) can be pre-registered like these one time. 
* For the reserved types, the JSON can be generated as part of Atlas build 
itself.
* As [~bergenholtz] mentions, this has the added benefit that it prevents 
erroneous registration of reserved types.
* The issue I see with this approach is that it seems to split the type and 
entity management of a component across two systems - the hooks and the Atlas 
server. For instance, if the hooks move out of Atlas, generating the JSON type 
definitions and including them at the Atlas server may not be a very convenient 
step.
* The other issue is that this registration essentially is a 'trusted' 
registration, as it would not be subject to any authorization that we are 
planning to build. See ATLAS-497. For reserved types, this might not be a major 
issue.

Regarding option 2:
[~shwethags] pointed out that currently the Kafka topic ATLAS_HOOK is not set 
up with multiple partitions. However for better performance, we will need to 
increase the number of partitions. Once this is done, this will complicate the 
consumption of type and entity messages if produced together. They will need to 
be produced into the same partition for things to work correctly, or there 
should be some synchronization at the consumer level.

> Decouple currently integrating components  from availability of Atlas service 
> for raising metadata events
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-512
>                 URL: https://issues.apache.org/jira/browse/ATLAS-512
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>
> The components that currently integrate with Atlas (Hive, Sqoop, Falcon, 
> Storm) all communicate their metadata events using Kafka as a messaging 
> layer. This effectively decouples these components from the Atlas server. 
> However, all of these components have some initialization that checks if 
> their respective models are registered with Atlas. For components that 
> integrate on the server, like HiveServer2 and Falcon, this initialization is 
> a one time check and hence, is manageable. Others like Sqoop, Storm and the 
> Hive CLI are client side components and hence the initialization happens for 
> every run or session of these components. Invoking the initialization (and 
> the one time check) every time like this effectively means that the Atlas 
> server should be always available.
> This JIRA is to try and remove this dependency and thus truly decouple these 
> components.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to