[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457138
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:38
Start Date: 10/Jul/20 12:38
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452816256



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
##
@@ -1754,6 +1760,16 @@ public void testForeignKeys() {
 Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col2");
 Assert.assertEquals(cachedKeys.get(0).getCatName(), DEFAULT_CATALOG_NAME);
 
+cachedKeys = sharedCache.listCachedForeignKeys(
+DEFAULT_CATALOG_NAME, tbl.getDbName(), tbl.getTableName(), 
tbl1.getDbName(), tbl1.getTableName());
+
+Assert.assertEquals(cachedKeys.size(), 1);
+Assert.assertEquals(cachedKeys.get(0).getFk_name(), "fk2");
+Assert.assertEquals(cachedKeys.get(0).getFktable_db(), "db");
+Assert.assertEquals(cachedKeys.get(0).getFktable_name(), 
tbl.getTableName());
+Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col1");

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457138)
Time Spent: 4.5h  (was: 4h 20m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457136
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:37
Start Date: 10/Jul/20 12:37
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452816142



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -261,44 +283,57 @@ public int getObjectSize(Class clazz, Object obj) {
 private Map parameters;
 private byte[] sdHash;
 private int otherSize;
-private int tableColStatsCacheSize;
-private int partitionCacheSize;
-private int partitionColStatsCacheSize;
-private int aggrColStatsCacheSize;
+
+// Arrays to hold the size/updated bit of cached objects.
+// These arrays are to be referenced using MemberName enum only.
+private int[] memberObjectsSize = new int[MemberName.values().length];
+private AtomicBoolean[] memberCacheUpdated = new 
AtomicBoolean[MemberName.values().length];
 
 private ReentrantReadWriteLock tableLock = new 
ReentrantReadWriteLock(true);
 // For caching column stats for an unpartitioned table
 // Key is column name and the value is the col stat object
 private Map tableColStatsCache = new 
ConcurrentHashMap();
-private AtomicBoolean isTableColStatsCacheDirty = new AtomicBoolean(false);
 // For caching partition objects
 // Ket is partition values and the value is a wrapper around the partition 
object
 private Map partitionCache = new 
ConcurrentHashMap();
-private AtomicBoolean isPartitionCacheDirty = new AtomicBoolean(false);
 // For caching column stats for a partitioned table
 // Key is aggregate of partition values, column name and the value is the 
col stat object
 private Map partitionColStatsCache =
 new ConcurrentHashMap();
-private AtomicBoolean isPartitionColStatsCacheDirty = new 
AtomicBoolean(false);
 // For caching aggregate column stats for all and all minus default 
partition
 // Key is column name and the value is a list of 2 col stat objects
 // (all partitions and all but default)
 private Map> aggrColStatsCache =
 new ConcurrentHashMap>();
-private AtomicBoolean isAggrPartitionColStatsCacheDirty = new 
AtomicBoolean(false);
+
+private Map primaryKeyCache = new 
ConcurrentHashMap<>();
+
+private Map foreignKeyCache = new 
ConcurrentHashMap<>();
+
+private Map notNullConstraintCache = new 
ConcurrentHashMap<>();
+
+private Map uniqueConstraintCache = new 
ConcurrentHashMap<>();
 
 TableWrapper(Table t, byte[] sdHash, String location, Map 
parameters) {
   this.t = t;
   this.sdHash = sdHash;
   this.location = location;
   this.parameters = parameters;
-  this.tableColStatsCacheSize = 0;
-  this.partitionCacheSize = 0;
-  this.partitionColStatsCacheSize = 0;
-  this.aggrColStatsCacheSize = 0;
+  for(MemberName mn : MemberName.values()) {
+this.memberObjectsSize[mn.getValue()] = 0;

Review comment:
   In second thought, I think ordinal is better as we freshly load cache 
entries during HMS startup. So, the ordering doesn't matter. However, setting 
values can be a problem if someone pass incorrect value or remove an element 
without updating other values.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457136)
Time Spent: 4h 20m  (was: 4h 10m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457109
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:16
Start Date: 10/Jul/20 12:16
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452806379



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
##
@@ -1754,6 +1760,16 @@ public void testForeignKeys() {
 Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col2");
 Assert.assertEquals(cachedKeys.get(0).getCatName(), DEFAULT_CATALOG_NAME);
 
+cachedKeys = sharedCache.listCachedForeignKeys(
+DEFAULT_CATALOG_NAME, tbl.getDbName(), tbl.getTableName(), 
tbl1.getDbName(), tbl1.getTableName());
+
+Assert.assertEquals(cachedKeys.size(), 1);
+Assert.assertEquals(cachedKeys.get(0).getFk_name(), "fk2");
+Assert.assertEquals(cachedKeys.get(0).getFktable_db(), "db");
+Assert.assertEquals(cachedKeys.get(0).getFktable_name(), 
tbl.getTableName());
+Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col1");

Review comment:
   Also validate if parent tbl key is proper too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457109)
Time Spent: 4h  (was: 3h 50m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457110
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:16
Start Date: 10/Jul/20 12:16
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452806379



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
##
@@ -1754,6 +1760,16 @@ public void testForeignKeys() {
 Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col2");
 Assert.assertEquals(cachedKeys.get(0).getCatName(), DEFAULT_CATALOG_NAME);
 
+cachedKeys = sharedCache.listCachedForeignKeys(
+DEFAULT_CATALOG_NAME, tbl.getDbName(), tbl.getTableName(), 
tbl1.getDbName(), tbl1.getTableName());
+
+Assert.assertEquals(cachedKeys.size(), 1);
+Assert.assertEquals(cachedKeys.get(0).getFk_name(), "fk2");
+Assert.assertEquals(cachedKeys.get(0).getFktable_db(), "db");
+Assert.assertEquals(cachedKeys.get(0).getFktable_name(), 
tbl.getTableName());
+Assert.assertEquals(cachedKeys.get(0).getFkcolumn_name(), "col1");

Review comment:
   Also validate if parent tbl name is proper too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457110)
Time Spent: 4h 10m  (was: 4h)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457105
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 12:01
Start Date: 10/Jul/20 12:01
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452799879



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -261,44 +283,57 @@ public int getObjectSize(Class clazz, Object obj) {
 private Map parameters;
 private byte[] sdHash;
 private int otherSize;
-private int tableColStatsCacheSize;
-private int partitionCacheSize;
-private int partitionColStatsCacheSize;
-private int aggrColStatsCacheSize;
+
+// Arrays to hold the size/updated bit of cached objects.
+// These arrays are to be referenced using MemberName enum only.
+private int[] memberObjectsSize = new int[MemberName.values().length];
+private AtomicBoolean[] memberCacheUpdated = new 
AtomicBoolean[MemberName.values().length];
 
 private ReentrantReadWriteLock tableLock = new 
ReentrantReadWriteLock(true);
 // For caching column stats for an unpartitioned table
 // Key is column name and the value is the col stat object
 private Map tableColStatsCache = new 
ConcurrentHashMap();
-private AtomicBoolean isTableColStatsCacheDirty = new AtomicBoolean(false);
 // For caching partition objects
 // Ket is partition values and the value is a wrapper around the partition 
object
 private Map partitionCache = new 
ConcurrentHashMap();
-private AtomicBoolean isPartitionCacheDirty = new AtomicBoolean(false);
 // For caching column stats for a partitioned table
 // Key is aggregate of partition values, column name and the value is the 
col stat object
 private Map partitionColStatsCache =
 new ConcurrentHashMap();
-private AtomicBoolean isPartitionColStatsCacheDirty = new 
AtomicBoolean(false);
 // For caching aggregate column stats for all and all minus default 
partition
 // Key is column name and the value is a list of 2 col stat objects
 // (all partitions and all but default)
 private Map> aggrColStatsCache =
 new ConcurrentHashMap>();
-private AtomicBoolean isAggrPartitionColStatsCacheDirty = new 
AtomicBoolean(false);
+
+private Map primaryKeyCache = new 
ConcurrentHashMap<>();
+
+private Map foreignKeyCache = new 
ConcurrentHashMap<>();
+
+private Map notNullConstraintCache = new 
ConcurrentHashMap<>();
+
+private Map uniqueConstraintCache = new 
ConcurrentHashMap<>();
 
 TableWrapper(Table t, byte[] sdHash, String location, Map 
parameters) {
   this.t = t;
   this.sdHash = sdHash;
   this.location = location;
   this.parameters = parameters;
-  this.tableColStatsCacheSize = 0;
-  this.partitionCacheSize = 0;
-  this.partitionColStatsCacheSize = 0;
-  this.aggrColStatsCacheSize = 0;
+  for(MemberName mn : MemberName.values()) {
+this.memberObjectsSize[mn.getValue()] = 0;

Review comment:
   Java treats enum as objects. Array indexes can be integers only. 
Therefore, I have to use mn.getValue() only. 
   
   PS: Enum also provides `ordinal` method that returns the position of enum 
member, but that can cause issues if order is changed. So, I decided to go 
ahead with creating own getValue() method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457105)
Time Spent: 3h 50m  (was: 3h 40m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457096
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 11:11
Start Date: 10/Jul/20 11:11
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452779486



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -470,6 +484,107 @@ boolean cachePartitions(Iterable parts, 
SharedCache sharedCache, bool
   }
 }
 
+boolean cachePrimaryKeys(List primaryKeys, boolean 
fromPrewarm) {
+  return cacheConstraints(primaryKeys, fromPrewarm, 
MemberName.PRIMARY_KEY_CACHE);
+}
+
+boolean cacheForeignKeys(List foreignKeys, boolean 
fromPrewarm) {
+  return cacheConstraints(foreignKeys, fromPrewarm, 
MemberName.FOREIGN_KEY_CACHE);
+}
+
+boolean cacheUniqueConstraints(List 
uniqueConstraints, boolean fromPrewarm) {
+  return cacheConstraints(uniqueConstraints, fromPrewarm, 
MemberName.UNIQUE_CONSTRAINT_CACHE);
+}
+
+boolean cacheNotNullConstraints(List 
notNullConstraints, boolean fromPrewarm) {
+  return cacheConstraints(notNullConstraints, fromPrewarm, 
MemberName.NOTNULL_CONSTRAINT_CACHE);
+}
+
+// Common method to cache constraints
+private boolean cacheConstraints(List constraintsList,
+ boolean fromPrewarm,
+ MemberName mn) {
+  if (constraintsList == null || constraintsList.isEmpty()) {
+return true;
+  }
+  try {
+tableLock.writeLock().lock();
+final int[] size = {0};

Review comment:
   This is being used inside lambda function. It requires the variable to 
be used as final. Because of this, I can't use int or Integer. So I chose int 
array instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457096)
Time Spent: 3h 40m  (was: 3.5h)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457095
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 11:07
Start Date: 10/Jul/20 11:07
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r45293



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2490,26 +2616,99 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);
+if (keys == null || keys.isEmpty()) {

Review comment:
   Created a follow up jira. 
https://issues.apache.org/jira/browse/HIVE-23834





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 457095)
Time Spent: 3.5h  (was: 3h 20m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=457093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-457093
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 10/Jul/20 11:02
Start Date: 10/Jul/20 11:02
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r452771399



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -514,6 +629,131 @@ public boolean containsPartition(List partVals) {
   return containsPart;
 }
 
+public void removeConstraint(String name) {
+  try {
+tableLock.writeLock().lock();
+Object constraint = null;
+MemberName mn = null;
+Class constraintClass = null;
+name = name.toLowerCase();
+if (this.primaryKeyCache.containsKey(name)) {
+  constraint = this.primaryKeyCache.remove(name);
+  mn = MemberName.PRIMARY_KEY_CACHE;
+  constraintClass = SQLPrimaryKey.class;
+} else if (this.foreignKeyCache.containsKey(name)) {
+  constraint = this.foreignKeyCache.remove(name);
+  mn = MemberName.FOREIGN_KEY_CACHE;
+  constraintClass = SQLForeignKey.class;
+} else if (this.notNullConstraintCache.containsKey(name)) {
+  constraint = this.notNullConstraintCache.remove(name);
+  mn = MemberName.NOTNULL_CONSTRAINT_CACHE;
+  constraintClass = SQLNotNullConstraint.class;
+} else if (this.uniqueConstraintCache.containsKey(name)) {
+  constraint = this.uniqueConstraintCache.remove(name);
+  mn = MemberName.UNIQUE_CONSTRAINT_CACHE;
+  constraintClass = SQLUniqueConstraint.class;
+}
+
+if(constraint == null) {
+  LOG.debug("Constraint: " + name + " does not exist in cache.");
+  return;
+}
+setMemberCacheUpdated(mn, true);
+int size = getObjectSize(constraintClass, constraint);
+updateMemberSize(mn, -1 * size, SizeMode.Delta);
+  } finally {
+tableLock.writeLock().unlock();
+  }
+}
+
+public void refreshPrimaryKeys(List keys) {
+  Map newKeys = new ConcurrentHashMap<>();
+  try {
+tableLock.writeLock().lock();
+int size = 0;
+for (SQLPrimaryKey key : keys) {
+  if (compareAndSetMemberCacheUpdated(MemberName.PRIMARY_KEY_CACHE, 
true, false)) {
+LOG.debug("Skipping primary key cache update for table: " + 
getTable().getTableName()
++ "; the primary keys we have is dirty.");
+return;
+  }
+  newKeys.put(key.getPk_name().toLowerCase(), key);
+  size += getObjectSize(SQLPrimaryKey.class, key);
+}
+primaryKeyCache = newKeys;
+updateMemberSize(MemberName.PRIMARY_KEY_CACHE, size, 
SizeMode.Snapshot);
+LOG.debug("Primary keys refresh in cache was successful.");

Review comment:
   Shall add catalog, db and table names in the log msg otherwise this is 
no use. Same for other methods too.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2490,26 +2616,99 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);
+if (keys == null || keys.isEmpty()) {

Review comment:
   Can we have a flag in TableWrapper in Cache to tell if it was set or 
not? Can be a follow-up jira.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -470,6 +484,107 @@ boolean cachePartitions(Iterable parts, 
SharedCache sharedCache, bool
   }
 }
 
+boolean cachePrimaryKeys(List primaryKeys, boolean 
fromPrewarm) {
+  return cacheConstraints(primaryKeys, fromPrewarm, 
MemberName.PRIMARY_KEY_CACHE);
+}
+
+boolean cacheForeignKeys(L

[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=455219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455219
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 07/Jul/20 04:06
Start Date: 07/Jul/20 04:06
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r450599566



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2497,26 +2610,87 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);
+
+return keys;
   }
 
   @Override public List getForeignKeys(String catName, String 
parentDbName, String parentTblName,
   String foreignDbName, String foreignTblName) throws MetaException {
-// TODO constraintCache
-return rawStore.getForeignKeys(catName, parentDbName, parentTblName, 
foreignDbName, foreignTblName);
+ // Get correct ForeignDBName and TableName
+if (foreignDbName == null || foreignTblName == null) {
+  return rawStore.getForeignKeys(catName, parentDbName, parentTblName, 
foreignDbName, foreignTblName);

Review comment:
   Created https://issues.apache.org/jira/browse/HIVE-23810 for followup.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 455219)
Time Spent: 3h 10m  (was: 3h)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=455218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455218
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 07/Jul/20 04:00
Start Date: 07/Jul/20 04:00
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r450598395



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -543,10 +557,30 @@ static void prewarm(RawStore rawStore) {
 tableColStats = rawStore.getTableColumnStatistics(catName, 
dbName, tblName, colNames, CacheUtils.HIVE_ENGINE);
 Deadline.stopTimer();
   }
+  Deadline.startTimer("getPrimaryKeys");
+  primaryKeys = rawStore.getPrimaryKeys(catName, dbName, tblName);
+  Deadline.stopTimer();
+  cacheObjects.setPrimaryKeys(primaryKeys);
+
+  Deadline.startTimer("getForeignKeys");
+  foreignKeys = rawStore.getForeignKeys(catName, null, null, 
dbName, tblName);

Review comment:
   Current usages in code of `getForeignKeys`  contains null only for 
parentDb/table, foreignDb/table are always being populated. So let's skip it as 
you mentioned. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 455218)
Time Spent: 3h  (was: 2h 50m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454544
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 04/Jul/20 17:11
Start Date: 04/Jul/20 17:11
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449789477



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -543,10 +557,30 @@ static void prewarm(RawStore rawStore) {
 tableColStats = rawStore.getTableColumnStatistics(catName, 
dbName, tblName, colNames, CacheUtils.HIVE_ENGINE);
 Deadline.stopTimer();
   }
+  Deadline.startTimer("getPrimaryKeys");
+  primaryKeys = rawStore.getPrimaryKeys(catName, dbName, tblName);
+  Deadline.stopTimer();
+  cacheObjects.setPrimaryKeys(primaryKeys);
+
+  Deadline.startTimer("getForeignKeys");
+  foreignKeys = rawStore.getForeignKeys(catName, null, null, 
dbName, tblName);

Review comment:
   We can have the foreignkeys kept under foreignkey table wrapper but the 
reference such as foreignkey db/table and key name in parent table wrapper. It 
will be useful when getForeignKeys is called with null for foreign db/tbl. I 
just want to confirm if this a frequent call with null as input. If yes, then 
let's do it, if not ignore this comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454544)
Time Spent: 2h 50m  (was: 2h 40m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454344&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454344
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 09:38
Start Date: 03/Jul/20 09:38
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449486010



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -1788,6 +2082,58 @@ public void addPartitionToCache(String catName, String 
dbName, String tblName, P
 }
   }
 
+  public void addPrimaryKeysToCache(String catName, String dbName, String 
tblName, List keys) {
+try {
+  cacheLock.readLock().lock();

Review comment:
   This is fine, the below method (cachePrimaryKeys), takes the writeLock. 
the read lock here, is just to get the tblWrapper object.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454344)
Time Spent: 1h 50m  (was: 1h 40m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454299
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 06:37
Start Date: 03/Jul/20 06:37
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449401898



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -1870,6 +2228,122 @@ public void removePartitionsFromCache(String catName, 
String dbName, String tblN
 return parts;
   }
 
+  public List listCachedPrimaryKeys(String catName, String 
dbName, String tblName) {
+List keys = new ArrayList<>();
+try {
+  cacheLock.readLock().lock();
+  TableWrapper tblWrapper = 
tableCache.getIfPresent(CacheUtils.buildTableKey(catName, dbName, tblName));
+  if (tblWrapper != null) {
+keys = tblWrapper.getPrimaryKeys();
+  }
+} finally {
+  cacheLock.readLock().unlock();
+}
+return keys;
+  }
+
+  public List listCachedForeignKeys(String catName, String 
foreignDbName, String foreignTblName,
+   String parentDbName, String 
parentTblName) {
+List keys = new ArrayList<>();
+try {
+  cacheLock.readLock().lock();
+  TableWrapper tblWrapper = 
tableCache.getIfPresent(CacheUtils.buildTableKey(catName, foreignDbName, 
foreignTblName));
+  if (tblWrapper != null) {
+keys = tblWrapper.getForeignKeys();
+  }
+} finally {
+  cacheLock.readLock().unlock();
+}
+
+// filter out required foreign keys based on parent db/tbl name
+if (!StringUtils.isEmpty(parentTblName) && 
!StringUtils.isEmpty(parentDbName)) {

Review comment:
   Even if tblWrapper is null, keys will be empty list and hence an empty 
list will be returned. So this should be fine, right?
   
   In case we move it to above if block, and assuming the list is not-empty, we 
will keep the read lock on for a longer duration (though only in milliseconds 
or even less), that's why I added it below. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454299)
Time Spent: 1.5h  (was: 1h 20m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454333
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 09:05
Start Date: 03/Jul/20 09:05
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449387162



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -543,10 +557,30 @@ static void prewarm(RawStore rawStore) {
 tableColStats = rawStore.getTableColumnStatistics(catName, 
dbName, tblName, colNames, CacheUtils.HIVE_ENGINE);
 Deadline.stopTimer();
   }
+  Deadline.startTimer("getPrimaryKeys");
+  primaryKeys = rawStore.getPrimaryKeys(catName, dbName, tblName);
+  Deadline.stopTimer();
+  cacheObjects.setPrimaryKeys(primaryKeys);
+
+  Deadline.startTimer("getForeignKeys");
+  foreignKeys = rawStore.getForeignKeys(catName, null, null, 
dbName, tblName);

Review comment:
   Then should we store foreign key mappings against parentDb and table for 
quick access (otherwise we will be scanning all the db/tables in cache)? 
   
   And this also means we will be keeping two copies, one with parent table and 
another with foreign table.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454333)
Time Spent: 1h 40m  (was: 1.5h)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454385&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454385
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 11:15
Start Date: 03/Jul/20 11:15
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449528837



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -867,6 +909,77 @@ private void updateTableColStats(RawStore rawStore, String 
catName, String dbNam
   }
 }
 
+private void updateTableForeignKeys(RawStore rawStore, String catName, 
String dbName, String tblName) {
+  LOG.debug("CachedStore: updating cached foreign keys objects for 
catalog: {}, database: {}, table: {}", catName,
+  dbName, tblName);
+  try {
+Deadline.startTimer("getForeignKeys");
+List fks = rawStore.getForeignKeys(catName, null, null, 
dbName, tblName);
+Deadline.stopTimer();
+
sharedCache.refreshForeignKeysInCache(StringUtils.normalizeIdentifier(catName),
+StringUtils.normalizeIdentifier(dbName), 
StringUtils.normalizeIdentifier(tblName), fks);
+LOG.debug("CachedStore: updated cached foreign keys objects for 
catalog: {}, database: {}, table: {}", catName,
+dbName, tblName);
+  } catch (MetaException e) {
+LOG.info("Updating CachedStore: unable to read foreign keys of 
catalog: " + catName + ", database: "

Review comment:
   Fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454385)
Time Spent: 2h 40m  (was: 2.5h)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454384
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 11:14
Start Date: 03/Jul/20 11:14
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449528675



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -514,6 +655,130 @@ public boolean containsPartition(List partVals) {
   return containsPart;
 }
 
+public void removeConstraint(String name) {
+  try {
+tableLock.writeLock().lock();
+Object constraint = null;
+MemberName mn = null;
+Class constraintClass = null;
+if (this.primaryKeyCache.containsKey(name)) {
+  constraint = this.primaryKeyCache.remove(name);
+  mn = MemberName.PRIMARY_KEY_CACHE;
+  this.memberCacheDirty[mn.ordinal()].set(true);
+  constraintClass = SQLPrimaryKey.class;
+} else if (this.foreignKeyCache.containsKey(name)) {
+  constraint = this.foreignKeyCache.remove(name);
+  mn = MemberName.FOREIGN_KEY_CACHE;
+  this.memberCacheDirty[mn.ordinal()].set(true);
+  constraintClass = SQLForeignKey.class;
+} else if (this.notNullConstraintCache.containsKey(name)) {
+  constraint = this.notNullConstraintCache.remove(name);
+  mn = MemberName.NOTNULL_CONSTRAINT_CACHE;
+  this.memberCacheDirty[mn.ordinal()].set(true);
+  constraintClass = SQLNotNullConstraint.class;
+} else if (this.uniqueConstraintCache.containsKey(name)) {
+  constraint = this.uniqueConstraintCache.remove(name);
+  mn = MemberName.UNIQUE_CONSTRAINT_CACHE;
+  this.memberCacheDirty[mn.ordinal()].set(true);
+  constraintClass = SQLUniqueConstraint.class;
+}
+
+if(constraint == null) {
+  LOG.debug("Constraint: " + name + " does not exist in cache.");
+  return;
+}
+int size = getObjectSize(constraintClass, constraint);
+updateMemberSize(mn, -1 * size, SizeMode.Delta);
+
+  } finally {
+tableLock.writeLock().unlock();
+  }
+}
+
+public void refreshPrimaryKeys(List keys) {
+  Map newKeys = new ConcurrentHashMap<>();
+  try {
+tableLock.writeLock().lock();
+int size = 0;
+for (SQLPrimaryKey key : keys) {
+  if 
(this.memberCacheDirty[MemberName.PRIMARY_KEY_CACHE.ordinal()].compareAndSet(true,
 false)) {

Review comment:
   Updated the name of the variable. This is used during refreshOperation. 
If a particular Object cache is set to true, means it was updated after the 
last refresh operation and should be refreshed now, otherwise, current refresh 
operation will not modify/refresh the cache.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454384)
Time Spent: 2.5h  (was: 2h 20m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454383&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454383
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 11:11
Start Date: 03/Jul/20 11:11
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449527172



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -514,6 +655,130 @@ public boolean containsPartition(List partVals) {
   return containsPart;
 }
 
+public void removeConstraint(String name) {
+  try {
+tableLock.writeLock().lock();
+Object constraint = null;
+MemberName mn = null;
+Class constraintClass = null;
+if (this.primaryKeyCache.containsKey(name)) {

Review comment:
   Fixed.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
##
@@ -514,6 +655,130 @@ public boolean containsPartition(List partVals) {
   return containsPart;
 }
 
+public void removeConstraint(String name) {
+  try {
+tableLock.writeLock().lock();
+Object constraint = null;
+MemberName mn = null;
+Class constraintClass = null;
+if (this.primaryKeyCache.containsKey(name)) {
+  constraint = this.primaryKeyCache.remove(name);
+  mn = MemberName.PRIMARY_KEY_CACHE;
+  this.memberCacheDirty[mn.ordinal()].set(true);
+  constraintClass = SQLPrimaryKey.class;
+} else if (this.foreignKeyCache.containsKey(name)) {
+  constraint = this.foreignKeyCache.remove(name);
+  mn = MemberName.FOREIGN_KEY_CACHE;
+  this.memberCacheDirty[mn.ordinal()].set(true);
+  constraintClass = SQLForeignKey.class;
+} else if (this.notNullConstraintCache.containsKey(name)) {
+  constraint = this.notNullConstraintCache.remove(name);
+  mn = MemberName.NOTNULL_CONSTRAINT_CACHE;
+  this.memberCacheDirty[mn.ordinal()].set(true);
+  constraintClass = SQLNotNullConstraint.class;
+} else if (this.uniqueConstraintCache.containsKey(name)) {
+  constraint = this.uniqueConstraintCache.remove(name);
+  mn = MemberName.UNIQUE_CONSTRAINT_CACHE;
+  this.memberCacheDirty[mn.ordinal()].set(true);
+  constraintClass = SQLUniqueConstraint.class;
+}
+
+if(constraint == null) {
+  LOG.debug("Constraint: " + name + " does not exist in cache.");
+  return;
+}
+int size = getObjectSize(constraintClass, constraint);
+updateMemberSize(mn, -1 * size, SizeMode.Delta);
+
+  } finally {
+tableLock.writeLock().unlock();
+  }
+}
+
+public void refreshPrimaryKeys(List keys) {
+  Map newKeys = new ConcurrentHashMap<>();
+  try {
+tableLock.writeLock().lock();
+int size = 0;
+for (SQLPrimaryKey key : keys) {
+  if 
(this.memberCacheDirty[MemberName.PRIMARY_KEY_CACHE.ordinal()].compareAndSet(true,
 false)) {
+LOG.debug("Skipping primary key cache update for table: " + 
getTable().getTableName()
++ "; the primary keys we have is dirty.");
+return;
+  }
+  newKeys.put(key.getPk_name(), key);
+  size += getObjectSize(SQLPrimaryKey.class, key);
+}
+primaryKeyCache = newKeys;
+updateMemberSize(MemberName.PRIMARY_KEY_CACHE, size, 
SizeMode.Snapshot);

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454383)
Time Spent: 2h 20m  (was: 2h 10m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 

[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454379&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454379
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 11:07
Start Date: 03/Jul/20 11:07
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449525799



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2497,26 +2599,82 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);

Review comment:
   done. Fetching keys from rawStore if we got empty/null. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454379)
Time Spent: 2h 10m  (was: 2h)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454377&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454377
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 11:06
Start Date: 03/Jul/20 11:06
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449525538



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -867,6 +909,77 @@ private void updateTableColStats(RawStore rawStore, String 
catName, String dbNam
   }
 }
 
+private void updateTableForeignKeys(RawStore rawStore, String catName, 
String dbName, String tblName) {
+  LOG.debug("CachedStore: updating cached foreign keys objects for 
catalog: {}, database: {}, table: {}", catName,
+  dbName, tblName);
+  try {
+Deadline.startTimer("getForeignKeys");
+List fks = rawStore.getForeignKeys(catName, null, null, 
dbName, tblName);
+Deadline.stopTimer();
+
sharedCache.refreshForeignKeysInCache(StringUtils.normalizeIdentifier(catName),
+StringUtils.normalizeIdentifier(dbName), 
StringUtils.normalizeIdentifier(tblName), fks);
+LOG.debug("CachedStore: updated cached foreign keys objects for 
catalog: {}, database: {}, table: {}", catName,
+dbName, tblName);
+  } catch (MetaException e) {

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454377)
Time Spent: 2h  (was: 1h 50m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454283&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454283
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 05:47
Start Date: 03/Jul/20 05:47
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449387162



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -543,10 +557,30 @@ static void prewarm(RawStore rawStore) {
 tableColStats = rawStore.getTableColumnStatistics(catName, 
dbName, tblName, colNames, CacheUtils.HIVE_ENGINE);
 Deadline.stopTimer();
   }
+  Deadline.startTimer("getPrimaryKeys");
+  primaryKeys = rawStore.getPrimaryKeys(catName, dbName, tblName);
+  Deadline.stopTimer();
+  cacheObjects.setPrimaryKeys(primaryKeys);
+
+  Deadline.startTimer("getForeignKeys");
+  foreignKeys = rawStore.getForeignKeys(catName, null, null, 
dbName, tblName);

Review comment:
   Then should we would need store foreign key mappings against parentDb 
and table for quick access (otherwise we will be scanning all the db/tables in 
cache)? 
   
   And this also means we will be keeping two copies, one with parent table and 
another with foreign table.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454283)
Time Spent: 1h 20m  (was: 1h 10m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=453958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453958
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 13:41
Start Date: 02/Jul/20 13:41
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r447415363



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2497,26 +2610,87 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);
+
+return keys;
   }
 
   @Override public List getForeignKeys(String catName, String 
parentDbName, String parentTblName,
   String foreignDbName, String foreignTblName) throws MetaException {
-// TODO constraintCache
-return rawStore.getForeignKeys(catName, parentDbName, parentTblName, 
foreignDbName, foreignTblName);
+ // Get correct ForeignDBName and TableName
+if (foreignDbName == null || foreignTblName == null) {
+  return rawStore.getForeignKeys(catName, parentDbName, parentTblName, 
foreignDbName, foreignTblName);

Review comment:
   This flow is a candidate for improvement as it tries to fetch all 
foreignkeys of give parent table and vice-versa which is frequent operations. 
Pls create a follow-up JIRA to use CachedStore for this case too.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2497,26 +2610,87 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);
+
+return keys;
   }
 
   @Override public List getForeignKeys(String catName, String 
parentDbName, String parentTblName,
   String foreignDbName, String foreignTblName) throws MetaException {
-// TODO constraintCache
-return rawStore.getForeignKeys(catName, parentDbName, parentTblName, 
foreignDbName, foreignTblName);
+ // Get correct ForeignDBName and TableName
+if (foreignDbName == null || foreignTblName == null) {

Review comment:
   We should take the same path if parentDbName or parentTblName is null.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -867,6 +909,77 @@ private void updateTableColStats(RawStore rawStore, String 
catName, String dbNam
   }
 }
 
+private void updateTableForeignKeys(RawStore rawStore, String catName, 
String dbName, String tblName) {
+  LOG.debug("CachedStore: updating cached foreign keys objects for 
catalog: {}, database: {}, table: {}", catName,
+  dbName, tblName);
+  try {
+Deadline.startTimer("getForeignKeys");
+List fks = rawStore.getForeignKeys(catName, null, null, 
dbName, tblName);
+Deadline.stopTimer();
+
sharedCache.refreshForeignKeysInCache(StringUtils.normalizeIdentifier(catName),
+StringUtils.normalizeIdentifier(dbName), 
StringUtils.normalizeIdentifier(tblName), fks);
+LOG.debug("CachedStore: updated cached foreign keys objects for 
catalo

[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=452263&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452263
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 11:33
Start Date: 29/Jun/20 11:33
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r446900743



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2497,26 +2599,82 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);

Review comment:
   Yes, While updating the cache, there is a possibility that table got 
updated but constraints didn't (they are yet to be updated). But this is 
similar to partition/columnStats caching.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452263)
Time Spent: 1h  (was: 50m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=452256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452256
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 11:29
Start Date: 29/Jun/20 11:29
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r446898404



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -543,10 +556,24 @@ static void prewarm(RawStore rawStore) {
 tableColStats = rawStore.getTableColumnStatistics(catName, 
dbName, tblName, colNames, CacheUtils.HIVE_ENGINE);
 Deadline.stopTimer();
   }
+  Deadline.startTimer("getPrimaryKeys");
+  rawStore.getPrimaryKeys(catName, dbName, tblName);
+  Deadline.stopTimer();
+  Deadline.startTimer("getForeignKeys");
+  rawStore.getForeignKeys(catName, null, null, dbName, tblName);
+  Deadline.stopTimer();
+  Deadline.startTimer("getUniqueConstraints");
+  rawStore.getUniqueConstraints(catName, dbName, tblName);
+  Deadline.stopTimer();
+  Deadline.startTimer("getNotNullConstraints");
+  rawStore.getNotNullConstraints(catName, dbName, tblName);
+  Deadline.stopTimer();
+
   // If the table could not cached due to memory limit, stop 
prewarm
   boolean isSuccess = sharedCache
   .populateTableInCache(table, tableColStats, partitions, 
partitionColStats, aggrStatsAllPartitions,

Review comment:
   Done. Though the new class just contains constraints objects for now, we 
can have a different refactoring jira for partition/column stat that can also 
refactor the array created to store size/dirtyCache variable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452256)
Time Spent: 50m  (was: 40m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=451534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451534
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 12:18
Start Date: 26/Jun/20 12:18
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r445577541



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -402,6 +385,32 @@ private static void updateStatsForAlterTable(RawStore 
rawStore, Table tblBefore,
 sharedCache.removePartitionColStatsFromCache(catalogName, dbName, 
tableName, msgPart.getPartValues(),
 msgPart.getColName());
 break;
+  case MessageBuilder.ADD_PRIMARYKEY_EVENT:
+  AddPrimaryKeyMessage addPrimaryKeyMessage = 
deserializer.getAddPrimaryKeyMessage(message);

Review comment:
   Should be 2 spaced indentation. Check other places too.

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStoreUpdateUsingEvents.java
##
@@ -295,6 +295,178 @@ public void testTableOpsForUpdateUsingEvents() throws 
Exception {
 sharedCache.getSdCache().clear();
   }
 
+  @Test
+  public void testConstraintsForUpdateUsingEvents() throws Exception {
+long lastEventId = -1;
+RawStore rawStore = hmsHandler.getMS();
+
+// Prewarm CachedStore
+CachedStore.setCachePrewarmedState(false);
+CachedStore.prewarm(rawStore);
+
+// Add a db via rawStore
+String dbName = "test_table_ops";
+String dbOwner = "user1";
+Database db = createTestDb(dbName, dbOwner);
+hmsHandler.create_database(db);
+db = rawStore.getDatabase(DEFAULT_CATALOG_NAME, dbName);
+
+String foreignDbName = "test_table_ops_foreign";
+Database foreignDb = createTestDb(foreignDbName, dbOwner);
+hmsHandler.create_database(foreignDb);
+foreignDb = rawStore.getDatabase(DEFAULT_CATALOG_NAME, foreignDbName);
+// Add a table via rawStore
+String tblName = "tbl";
+String tblOwner = "user1";
+FieldSchema col1 = new FieldSchema("col1", "int", "integer column");
+FieldSchema col2 = new FieldSchema("col2", "string", "string column");
+List cols = new ArrayList();
+cols.add(col1);
+cols.add(col2);
+List ptnCols = new ArrayList();
+Table tbl = createTestTbl(dbName, tblName, tblOwner, cols, ptnCols);
+String foreignTblName = "ftbl";
+Table foreignTbl = createTestTbl(foreignDbName, foreignTblName, tblOwner, 
cols, ptnCols);
+
+SQLPrimaryKey key = new SQLPrimaryKey(dbName, tblName, col1.getName(), 1, 
"pk1",
+false, false, false);
+SQLUniqueConstraint uC = new SQLUniqueConstraint(DEFAULT_CATALOG_NAME, 
dbName, tblName,
+col1.getName(), 2, "uc1", false, false, false);
+SQLNotNullConstraint nN = new SQLNotNullConstraint(DEFAULT_CATALOG_NAME, 
dbName, tblName,
+col1.getName(), "nn1", false, false, false);
+SQLForeignKey foreignKey = new SQLForeignKey(key.getTable_db(), 
key.getTable_name(), key.getColumn_name(),
+foreignDbName, foreignTblName, key.getColumn_name(), 2, 1,2,
+"fk1", key.getPk_name(), false, false, false);
+
+hmsHandler.create_table_with_constraints(tbl,
+Arrays.asList(key), null, Arrays.asList(uC), Arrays.asList(nN), 
null, null);
+hmsHandler.create_table_with_constraints(foreignTbl, null, 
Arrays.asList(foreignKey),
+null, null, null, null);
+
+tbl = rawStore.getTable(DEFAULT_CATALOG_NAME, dbName, tblName);
+foreignTbl = rawStore.getTable(DEFAULT_CATALOG_NAME, foreignDbName, 
foreignTblName);
+
+// Read database, table via CachedStore
+Database dbRead= sharedCache.getDatabaseFromCache(DEFAULT_CATALOG_NAME, 
dbName);
+Assert.assertEquals(db, dbRead);
+Table tblRead = sharedCache.getTableFromCache(DEFAULT_CATALOG_NAME, 
dbName, tblName);
+compareTables(tblRead, tbl);
+
+Table foreignTblRead = sharedCache.getTableFromCache(DEFAULT_CATALOG_NAME, 
foreignDbName, foreignTblName);
+compareTables(foreignTblRead, foreignTbl);
+
+List keys = rawStore.getPrimaryKeys(DEFAULT_CATALOG_NAME, 
dbName, tblName);
+List keysRead = 
sharedCache.listCachedPrimaryKeys(DEFAULT_CATALOG_NAME, dbName, tblName);
+assertsForPrimarkaryKey(keysRead, 1, 0, keys.get(0));
+
+List nNs = 
rawStore.getNotNullConstraints(DEFAULT_CATALOG_NAME, dbName, tblName);
+List nNsRead = 
sharedCache.listCachedNotNullConstraints(DEFAULT_CATALOG_NAME, dbName, tblName);
+assertsForNotNullConstraints(nNsRead, 1, 0, nNs.get(0));
+
+List uns = 
rawStore.getUniqueConstraints(DEFAULT_CATALOG_NAME, dbName, tblName);
+List unsRead = 
sharedCache.listCachedUn

[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=446407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446407
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 10:45
Start Date: 16/Jun/20 10:45
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on pull request #1109:
URL: https://github.com/apache/hive/pull/1109#issuecomment-644686392


   @sankarh please take a look at the patch. 
   
   The tests failure seems unrelated, though I will try to run them locally and 
verify. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446407)
Time Spent: 0.5h  (was: 20m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=445769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-445769
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 15/Jun/20 10:31
Start Date: 15/Jun/20 10:31
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on pull request #1109:
URL: https://github.com/apache/hive/pull/1109#issuecomment-644046296


   This patch doesn't add caching for default/unique constraints as they are 
aren't pushed to notification logs. I have opened a jira for the same: 
HIVE-23618.
   
   I will add unique/default constraints as part of separate jira once 
HIVE-23618 is resolved.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 445769)
Time Spent: 20m  (was: 10m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=445767&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-445767
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 15/Jun/20 10:29
Start Date: 15/Jun/20 10:29
Worklog Time Spent: 10m 
  Work Description: adesh-rao opened a new pull request #1109:
URL: https://github.com/apache/hive/pull/1109


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 445767)
Remaining Estimate: 0h
Time Spent: 10m

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)