[ https://issues.apache.org/jira/browse/ATLAS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167236#comment-15167236 ]
Hemanth Yamijala commented on ATLAS-512: ---------------------------------------- A few more discussions happened today and so updating on latest thinking: Regarding option 1: It has been pointed out that it still retains the need for the Atlas server to be active for type registration. One potential way of managing this is to slightly modify the mechanics of Option 1. * Suppose that we provide a facility to 'bootstrap' the Atlas server with a set of types. This can be done by simply dropping these bootstrap type definitions as JSON files in a well known directory. * The Atlas server can on startup read these and register the types, provided they are not already registered. * System level and 'reserved types' like models representing the Hadoop components (Hive, Falcon, Sqoop etc) can be pre-registered like these one time. * For the reserved types, the JSON can be generated as part of Atlas build itself. * As [~bergenholtz] mentions, this has the added benefit that it prevents erroneous registration of reserved types. * The issue I see with this approach is that it seems to split the type and entity management of a component across two systems - the hooks and the Atlas server. For instance, if the hooks move out of Atlas, generating the JSON type definitions and including them at the Atlas server may not be a very convenient step. * The other issue is that this registration essentially is a 'trusted' registration, as it would not be subject to any authorization that we are planning to build. See ATLAS-497. For reserved types, this might not be a major issue. Regarding option 2: [~shwethags] pointed out that currently the Kafka topic ATLAS_HOOK is not set up with multiple partitions. However for better performance, we will need to increase the number of partitions. Once this is done, this will complicate the consumption of type and entity messages if produced together. They will need to be produced into the same partition for things to work correctly, or there should be some synchronization at the consumer level. > Decouple currently integrating components from availability of Atlas service > for raising metadata events > --------------------------------------------------------------------------------------------------------- > > Key: ATLAS-512 > URL: https://issues.apache.org/jira/browse/ATLAS-512 > Project: Atlas > Issue Type: Sub-task > Reporter: Hemanth Yamijala > Assignee: Hemanth Yamijala > > The components that currently integrate with Atlas (Hive, Sqoop, Falcon, > Storm) all communicate their metadata events using Kafka as a messaging > layer. This effectively decouples these components from the Atlas server. > However, all of these components have some initialization that checks if > their respective models are registered with Atlas. For components that > integrate on the server, like HiveServer2 and Falcon, this initialization is > a one time check and hence, is manageable. Others like Sqoop, Storm and the > Hive CLI are client side components and hence the initialization happens for > every run or session of these components. Invoking the initialization (and > the one time check) every time like this effectively means that the Atlas > server should be always available. > This JIRA is to try and remove this dependency and thus truly decouple these > components. -- This message was sent by Atlassian JIRA (v6.3.4#6332)