Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

2016-04-07 Thread Shwetha GS

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/#review127565
---


Ship it!




Ship It!

- Shwetha GS


On April 6, 2016, 7:08 p.m., Suma Shivaprasad wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45784/
> ---
> 
> (Updated April 6, 2016, 7:08 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-527
> https://issues.apache.org/jira/browse/ATLAS-527
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Added support to track lineage between HDFS Paths and hive tables  in 
> 
> a. LOAD( at table, partition level) - input is a HDFS path and output is 
> table( even though we dont create partition entities, we are still tracking 
> the lineage at table level for partitions. This could be an issue if there 
> are large number of partition queries which is not being addressed in this 
> jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> b. IMPORT, EXPORT to and from hdfs paths - Refer 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
> d. ALTER TABLE LOCATION for an external table - input is the new hdfs path 
> and o/p is the table.
> 
> Also changed the ordering of model registration by sorting them by 
> modifiedTime to ensure they are registered in correct order
> 
> 
> Diffs
> -
> 
>   
> addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java
>  555d565 
>   
> addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala 
> c964f73 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
>  3a802d7 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
> 68e32ff 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
> e17afb8 
>   
> addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java
>  5665856 
>   client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
>   
> repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java
>  430bb6b 
> 
> Diff: https://reviews.apache.org/r/45784/diff/
> 
> 
> Testing
> ---
> 
> Added tests in HiveHookIT
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>



Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

2016-04-06 Thread Suma Shivaprasad


> On April 6, 2016, 11:21 a.m., Shwetha GS wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java,
> >  line 480
> > 
> >
> > We need to fix the clusterName mess later - can't pickup hdfs 
> > clustername from hive conf

Have removed it for now since we dont know the right clusterName


- Suma


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/#review127310
---


On April 6, 2016, 7:08 p.m., Suma Shivaprasad wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45784/
> ---
> 
> (Updated April 6, 2016, 7:08 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-527
> https://issues.apache.org/jira/browse/ATLAS-527
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Added support to track lineage between HDFS Paths and hive tables  in 
> 
> a. LOAD( at table, partition level) - input is a HDFS path and output is 
> table( even though we dont create partition entities, we are still tracking 
> the lineage at table level for partitions. This could be an issue if there 
> are large number of partition queries which is not being addressed in this 
> jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> b. IMPORT, EXPORT to and from hdfs paths - Refer 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
> d. ALTER TABLE LOCATION for an external table - input is the new hdfs path 
> and o/p is the table.
> 
> Also changed the ordering of model registration by sorting them by 
> modifiedTime to ensure they are registered in correct order
> 
> 
> Diffs
> -
> 
>   
> addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java
>  555d565 
>   
> addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala 
> c964f73 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
>  3a802d7 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
> 68e32ff 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
> e17afb8 
>   
> addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java
>  5665856 
>   client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
>   
> repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java
>  430bb6b 
> 
> Diff: https://reviews.apache.org/r/45784/diff/
> 
> 
> Testing
> ---
> 
> Added tests in HiveHookIT
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>



Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

2016-04-06 Thread Suma Shivaprasad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
---

(Updated April 6, 2016, 5:44 p.m.)


Review request for atlas.


Changes
---

Fixed review comments


Bugs: ATLAS-527
https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
---

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( 
even though we dont create partition entities, we are still tracking the 
lineage at table level for partitions. This could be an issue if there are 
large number of partition queries which is not being addressed in this jira - 
https://issues.apache.org/jira/browse/ATLAS-619) . refer 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and 
o/p is the table.

Also changed the ordering of model registration by sorting them by modifiedTime 
to ensure they are registered in correct order


Diffs (updated)
-

  
addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java
 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala 
c964f73 
  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
e17afb8 
  
addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java
 5665856 
  client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
  
repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 
0a04c5f 
  
repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 
430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
---

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad



Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

2016-04-06 Thread Suma Shivaprasad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
---

(Updated April 6, 2016, 5:03 p.m.)


Review request for atlas.


Bugs: ATLAS-527
https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
---

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( 
even though we dont create partition entities, we are still tracking the 
lineage at table level for partitions. This could be an issue if there are 
large number of partition queries which is not being addressed in this jira - 
https://issues.apache.org/jira/browse/ATLAS-619) . refer 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and 
o/p is the table.

Also changed the ordering of model registration by sorting them by modifiedTime 
to ensure they are registered in correct order


Diffs (updated)
-

  
addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java
 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala 
c964f73 
  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
e17afb8 
  
addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java
 5665856 
  client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
  
repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 
430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
---

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad



Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

2016-04-06 Thread Suma Shivaprasad


> On April 6, 2016, 11:21 a.m., Shwetha GS wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java, 
> > line 519
> > 
> >
> > Aren't there cases where input/output is local fs, for example load 
> > from local path?

I am filtering out the cases where it is LOCAL_DIR  by checking getType = 
DFS_DIR and theres also test case for LOAD local DIR and INSERT into local dir 
which confirms that this case is addressed. You are suggesting we ignore local 
dirs right?


> On April 6, 2016, 11:21 a.m., Shwetha GS wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java, 
> > line 558
> > 
> >
> > This should be part of HiveMetaStoreBridge and should be used in 
> > import-hive as well?
> > 
> > Because this lineage will be created in import-hive, process name 
> > should be just tablename for create table so that its created just once.

Initially this was my thought too. However not sure how to get the query for 
the create table itself. I checked how show create table constructs this  and 
it is on the fly and it does not store in metadata. Also, if we dont address 
this, tt will look different from the other lineages where we will always hav 
the query in the process . So did nto want to address this now till we figure 
out how we can construct the query itself. Created a separate issue to track 
this - https://issues.apache.org/jira/browse/ATLAS-642


- Suma


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/#review127310
---


On April 5, 2016, 11:58 p.m., Suma Shivaprasad wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45784/
> ---
> 
> (Updated April 5, 2016, 11:58 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-527
> https://issues.apache.org/jira/browse/ATLAS-527
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Added support to track lineage between HDFS Paths and hive tables  in 
> 
> a. LOAD( at table, partition level) - input is a HDFS path and output is 
> table( even though we dont create partition entities, we are still tracking 
> the lineage at table level for partitions. This could be an issue if there 
> are large number of partition queries which is not being addressed in this 
> jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> b. IMPORT, EXPORT to and from hdfs paths - Refer 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
> d. ALTER TABLE LOCATION for an external table - input is the new hdfs path 
> and o/p is the table.
> 
> Also changed the ordering of model registration by sorting them by 
> modifiedTime to ensure they are registered in correct order
> 
> 
> Diffs
> -
> 
>   
> addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java
>  555d565 
>   
> addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala 
> c964f73 
>   addons/hive-bridge/pom.xml e125f18 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
>  3a802d7 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
> 68e32ff 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
> e17afb8 
>   
> addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java
>  5665856 
>   
> repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java
>  0a04c5f 
>   
> repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java
>  430bb6b 
> 
> Diff: https://reviews.apache.org/r/45784/diff/
> 
> 
> Testing
> ---
> 
> Added tests in HiveHookIT
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>



Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

2016-04-06 Thread Shwetha GS

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/#review127310
---




addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
 (line 472)


We need to fix the clusterName mess later - can't pickup hdfs clustername 
from hive conf



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 
220)


Earlier one was more readable. You can use set methods instead of this long 
constructor?



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 
454)


Aren't there cases where input/output is local fs, for example load from 
local path?



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 
493)


This should be part of HiveMetaStoreBridge and should be used in 
import-hive as well?

Because this lineage will be created in import-hive, process name should be 
just tablename for create table so that its created just once.



repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 
(line 165)


Use these in Process type definition. 

Actually, these should be in AtlasClient?


- Shwetha GS


On April 5, 2016, 11:58 p.m., Suma Shivaprasad wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45784/
> ---
> 
> (Updated April 5, 2016, 11:58 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-527
> https://issues.apache.org/jira/browse/ATLAS-527
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Added support to track lineage between HDFS Paths and hive tables  in 
> 
> a. LOAD( at table, partition level) - input is a HDFS path and output is 
> table( even though we dont create partition entities, we are still tracking 
> the lineage at table level for partitions. This could be an issue if there 
> are large number of partition queries which is not being addressed in this 
> jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> b. IMPORT, EXPORT to and from hdfs paths - Refer 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
> d. ALTER TABLE LOCATION for an external table - input is the new hdfs path 
> and o/p is the table.
> 
> Also changed the ordering of model registration by sorting them by 
> modifiedTime to ensure they are registered in correct order
> 
> 
> Diffs
> -
> 
>   
> addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java
>  555d565 
>   
> addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala 
> c964f73 
>   addons/hive-bridge/pom.xml e125f18 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
>  3a802d7 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
> 68e32ff 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
> e17afb8 
>   
> addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java
>  5665856 
>   
> repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java
>  0a04c5f 
>   
> repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java
>  430bb6b 
> 
> Diff: https://reviews.apache.org/r/45784/diff/
> 
> 
> Testing
> ---
> 
> Added tests in HiveHookIT
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>



Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

2016-04-05 Thread Suma Shivaprasad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
---

(Updated April 5, 2016, 11:58 p.m.)


Review request for atlas.


Bugs: ATLAS-527
https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description (updated)
---

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( 
even though we dont create partition entities, we are still tracking the 
lineage at table level for partitions. This could be an issue if there are 
large number of partition queries which is not being addressed in this jira - 
https://issues.apache.org/jira/browse/ATLAS-619) . refer 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and 
o/p is the table.

Also changed the ordering of model registration by sorting them by modifiedTime 
to ensure they are registered in correct order


Diffs
-

  
addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java
 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala 
c964f73 
  addons/hive-bridge/pom.xml e125f18 
  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
e17afb8 
  
addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java
 5665856 
  
repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 
0a04c5f 
  
repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 
430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
---

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad



Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

2016-04-05 Thread Suma Shivaprasad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
---

(Updated April 5, 2016, 11:54 p.m.)


Review request for atlas.


Bugs: ATLAS-527
https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
---

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( 
even though we dont create partition entities, we are still tracking the 
lineage at table level for partitions. This could be an issue if there are 
large number of partition queries which is not being addressed in this jira - 
https://issues.apache.org/jira/browse/ATLAS-619) . refer 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and 
o/p is the table.


Diffs
-

  
addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java
 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala 
c964f73 
  addons/hive-bridge/pom.xml e125f18 
  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
e17afb8 
  
addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java
 5665856 
  
repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 
0a04c5f 
  
repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 
430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing (updated)
---

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad