[jira] [Commented] (ATLAS-1661) import hive script to handle updates like rename/delete

2017-03-15 Thread Madhan Neethiraj (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925647#comment-15925647
 ] 

Madhan Neethiraj commented on ATLAS-1661:
-

[~ssainath] - import-hive is generally meant for one-time use, to update Atlas 
with the information from Hive metastore. It is not expected to be run 
multiple-times; even if it is run multiple times, the tool wouldn't be able to 
handle renames - as these information might not be available in Hive metastore.

> import hive script to handle updates like rename/delete
> ---
>
> Key: ATLAS-1661
> URL: https://issues.apache.org/jira/browse/ATLAS-1661
> Project: Atlas
>  Issue Type: Improvement
>  Components: atlas-intg
>Reporter: Sharmadha Sainath
>Priority: Minor
>
> 1. Disabled hive hook
> 2. Created table table1
> 3. Ran import-hive.sh script , Atlas ingested table1.
> 4. Altered table table1 , rename to table1_new.
> 5. Ran import-hive.sh script , Atlas created a new table table1new .
> table1 wasn't updated with new name.
> This is the expected behavior with import-hive script as opposed to hive 
> hook, as hive hook is synchronous and import-hive is not.
> But as a customer , running import-hive.sh multiple times and doing many hive 
> operations may result in inconsistency while applying ranger policies to the 
> table and in many scenarios , since it is not documented to run import hive 
> script only once. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1661) import hive script to handle updates like rename/delete

2017-03-14 Thread Ayub Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924417#comment-15924417
 ] 

Ayub Khan commented on ATLAS-1661:
--

[~madhan.neethiraj] Is this an expectation from the customer? could you please 
confirm?

> import hive script to handle updates like rename/delete
> ---
>
> Key: ATLAS-1661
> URL: https://issues.apache.org/jira/browse/ATLAS-1661
> Project: Atlas
>  Issue Type: Improvement
>  Components: atlas-intg
>Reporter: Sharmadha Sainath
>Priority: Minor
>
> 1. Disabled hive hook
> 2. Created table table1
> 3. Ran import-hive.sh script , Atlas ingested table1.
> 4. Altered table table1 , rename to table1_new.
> 5. Ran import-hive.sh script , Atlas created a new table table1new .
> table1 wasn't updated with new name.
> This is the expected behavior with import-hive script as opposed to hive 
> hook, as hive hook is synchronous and import-hive is not.
> But as a customer , running import-hive.sh multiple times and doing many hive 
> operations may result in inconsistency while applying ranger policies to the 
> table and in many scenarios , since it is not documented to run import hive 
> script only once. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1661) import hive script to handle updates like rename/delete

2017-03-14 Thread Sharmadha Sainath (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924301#comment-15924301
 ] 

Sharmadha Sainath commented on ATLAS-1661:
--

[~ayubkhan] 
>> I believe the intent of import-hive.sh tool is not to track all the metadata 
>> changes but to take a snapshot of the metadata at that point of time.
I completely agree with you. That is the intent. Currently import-hive script 
looks for the qualified name of the table if it is already present , and 
updates the same . In the case mentioned in the description , it doesn't find 
the table tablenew so it creates a new table. But it would be a good to have 
feature if import hive script could have a mechanism to know the history of the 
tablenew and update accordingly.

>> Are you suggesting to have the hiveHook capability built into import-hive.sh 
>> tool also?
Yes , because that would be the expectation from the customer. Only difference 
customer would know is , hive hook updates as and when query is fired , and 
import hive script does bunch update when run.

> import hive script to handle updates like rename/delete
> ---
>
> Key: ATLAS-1661
> URL: https://issues.apache.org/jira/browse/ATLAS-1661
> Project: Atlas
>  Issue Type: Improvement
>  Components: atlas-intg
>Reporter: Sharmadha Sainath
>Priority: Minor
>
> 1. Disabled hive hook
> 2. Created table table1
> 3. Ran import-hive.sh script , Atlas ingested table1.
> 4. Altered table table1 , rename to table1_new.
> 5. Ran import-hive.sh script , Atlas created a new table table1new .
> table1 wasn't updated with new name.
> This is the expected behavior with import-hive script as opposed to hive 
> hook, as hive hook is synchronous and import-hive is not.
> But as a customer , running import-hive.sh multiple times and doing many hive 
> operations may result in inconsistency while applying ranger policies to the 
> table and in many scenarios , since it is not documented to run import hive 
> script only once. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ATLAS-1661) import hive script to handle updates like rename/delete

2017-03-14 Thread Ayub Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924240#comment-15924240
 ] 

Ayub Khan commented on ATLAS-1661:
--

Import-hive.sh is a standalone(not daemon) tool to take metadata snapshot when 
triggered and post it to Atlas - this is the primary targeted use-case of this 
tool. Import tool does not have details of previous changes to the same table. 
So, you wont see that alter table updates for import command.

HiveHook (runs as a daemon) is configured to track the metadata changes to a 
hive entities and posts all the changes/updates to Atlas. This use-case is 
handled by default, as hiveHook is enabled when atlas is deployed.

I believe the intent of import-hive.sh tool is not to track all the metadata 
changes but to take a snapshot of the metadata at that point of time.

[~ssainath] Are you suggesting to have the hiveHook capability built into 
import-hive.sh tool also? 

[~madhan.neethiraj] [~suryakoneru]

> import hive script to handle updates like rename/delete
> ---
>
> Key: ATLAS-1661
> URL: https://issues.apache.org/jira/browse/ATLAS-1661
> Project: Atlas
>  Issue Type: Improvement
>  Components: atlas-intg
>Reporter: Sharmadha Sainath
>Priority: Minor
>
> 1. Disabled hive hook
> 2. Created table table1
> 3. Ran import-hive.sh script , Atlas ingested table1.
> 4. Altered table table1 , rename to table1_new.
> 5. Ran import-hive.sh script , Atlas created a new table table1new .
> table1 wasn't updated with new name.
> This is the expected behavior with import-hive script as opposed to hive 
> hook, as hive hook is synchronous and import-hive is not.
> But as a customer , running import-hive.sh multiple times and doing many hive 
> operations may result in inconsistency while applying ranger policies to the 
> table and in many scenarios , since it is not documented to run import hive 
> script only once. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)