kgyrtkirk commented on a change in pull request #1105:
URL: https://github.com/apache/hive/pull/1105#discussion_r451502201



##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
##########
@@ -734,6 +734,21 @@ dropPartitionOperator
     EQUAL | NOTEQUAL | LESSTHANOREQUALTO | LESSTHAN | GREATERTHANOREQUALTO | 
GREATERTHAN
     ;
 
+filterPartitionSpec
+    :
+    LPAREN filterPartitionVal (COMMA  filterPartitionVal )* RPAREN -> 
^(TOK_PARTSPEC filterPartitionVal +)
+    ;
+
+filterPartitionVal
+    :
+    identifier filterPartitionOperator constant -> ^(TOK_PARTVAL identifier 
filterPartitionOperator constant)

Review comment:
       old `partitionSpec` doesn't mandatorily required the constant
   ```
   identifier (EQUAL constant)? 
   ```
   
   were there any use cases of that?

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##########
@@ -383,7 +375,29 @@ void findUnknownPartitions(Table table, Set<Path> 
partPaths,
     // now check the table folder and see if we find anything
     // that isn't in the metastore
     Set<Path> allPartDirs = new HashSet<Path>();
+    Set<Path> partDirs = new HashSet<Path>();
+    List<FieldSchema> partColumns = table.getPartitionKeys();
     checkPartitionDirs(tablePath, allPartDirs, 
Collections.unmodifiableList(getPartColNames(table)));
+
+    if (filterExp != null) {
+      PartitionExpressionProxy expressionProxy = createExpressionProxy(conf);
+      List<String> paritions = new ArrayList<>();
+      for (Path path : allPartDirs) {
+        // remove the table's path from the partition path
+        // eg: <tablePath>/p1=1/p2=2/p3=3 ---> p1=1/p2=2/p3=3
+        paritions.add(path.toString().substring(tablePath.toString().length() 
+ 1));
+      }
+      // Remove all partition paths which does not matches the filter 
expression.
+      expressionProxy.filterPartitionsByExpr(partColumns, filterExp,
+          conf.get(MetastoreConf.ConfVars.DEFAULTPARTITIONNAME.getVarname()), 
paritions);
+
+      // now the partition list will contain all the paths that matches the 
filter expression.
+      // add them back to partDirs.
+      for (String path : paritions) {
+        partDirs.add(new Path(tablePath.toString() + "/" + path));

Review comment:
       instead of concatenating with `/` use `new Path(parentPath,child)` - 
it's more portable

##########
File path: itests/src/test/resources/testconfiguration.properties
##########
@@ -222,6 +222,7 @@ mr.query.files=\
   mapjoin_subquery2.q,\
   mapjoin_test_outer.q,\
   masking_5.q,\
+  msck_repair_filter.q,\

Review comment:
       is there a reason that we run this test with mr?

##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
##########
@@ -1942,9 +1942,8 @@ metastoreCheck
 @after { popMsg(state); }
     : KW_MSCK (repair=KW_REPAIR)?
       (KW_TABLE tableName
-        ((add=KW_ADD | drop=KW_DROP | sync=KW_SYNC) (parts=KW_PARTITIONS))? |
-        (partitionSpec)?)
-    -> ^(TOK_MSCK $repair? tableName? $add? $drop? $sync? (partitionSpec*)?)
+        ((add=KW_ADD | drop=KW_DROP | sync=KW_SYNC) (parts=KW_PARTITIONS) 
(filterPartitionSpec)?)?)
+    -> ^(TOK_MSCK $repair? tableName? $add? $drop? $sync? 
(filterPartitionSpec)?)

Review comment:
       I know it was here before - but let's fix this up:
   
   instead of separate add/drop/sync variable ...we could have 
`opt=(KW_ADD|KW_DROP|KW_SYNC)` ? that will make the other end more readable as 
well

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java
##########
@@ -63,13 +67,24 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
     }
 
     Table table = getTable(tableName);
-    List<Map<String, String>> specs = getPartitionSpecs(table, root);
+    Map<Integer, List<ExprNodeGenericFuncDesc>> partitionSpecs = 
getFullPartitionSpecs(root, table, conf, false);
+    byte[] filterExp = null;
+    if (partitionSpecs != null & !partitionSpecs.isEmpty()) {
+      // explicitly set expression proxy class to 
PartitionExpressionForMetastore since we intend to use the
+      // filterPartitionsByExpr of PartitionExpressionForMetastore for 
partition pruning down the line.
+      conf.set(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(),

Review comment:
       I don't think this will work - this is the ql module ; while 
`EXPRESSION_PROXY_CLASS` is a metastore conf key; in a remote metastore setup 
this set will probably have no effect...
   have you tried it?
   I think making a check and returning with an error that this feature is not 
available due to required conf change is fine

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java
##########
@@ -63,13 +67,24 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
     }
 
     Table table = getTable(tableName);
-    List<Map<String, String>> specs = getPartitionSpecs(table, root);
+    Map<Integer, List<ExprNodeGenericFuncDesc>> partitionSpecs = 
getFullPartitionSpecs(root, table, conf, false);
+    byte[] filterExp = null;
+    if (partitionSpecs != null & !partitionSpecs.isEmpty()) {
+      // explicitly set expression proxy class to 
PartitionExpressionForMetastore since we intend to use the
+      // filterPartitionsByExpr of PartitionExpressionForMetastore for 
partition pruning down the line.
+      conf.set(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(),
+          PartitionExpressionForMetastore.class.getCanonicalName());
+      // fetch the first value of partitionSpecs map since it will always have 
one key, value pair
+      filterExp = SerializationUtilities.serializeExpressionToKryo(

Review comment:
       why this needs to be flattened into a `byte[]` ?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
##########
@@ -837,6 +844,118 @@ public static void checkColumnName(String columnName) 
throws SemanticException {
     return colList;
   }
 
+  /**
+   * Get the partition specs from the tree. This stores the full specification
+   * with the comparator operator into the output list.
+   *
+   * @return Map of partitions by prefix length. Most of the time prefix 
length will
+   *         be the same for all partition specs, so we can just OR the 
expressions.
+   */
+  public static Map<Integer, List<ExprNodeGenericFuncDesc>> 
getFullPartitionSpecs(

Review comment:
       can we find a new home for these 2 `static` methods? :)
   `ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java`

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##########
@@ -383,7 +375,29 @@ void findUnknownPartitions(Table table, Set<Path> 
partPaths,
     // now check the table folder and see if we find anything
     // that isn't in the metastore
     Set<Path> allPartDirs = new HashSet<Path>();
+    Set<Path> partDirs = new HashSet<Path>();

Review comment:
       move this variable inside the if

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##########
@@ -240,40 +243,27 @@ void checkTable(String catName, String dbName, String 
tableName,
     }
 
     PartitionIterable parts;
-    boolean findUnknownPartitions = true;
 
     if (isPartitioned(table)) {
-      if (partitions == null || partitions.isEmpty()) {
+      if (filterExp != null) {
+        List<Partition> results = new ArrayList<>();
+        getPartitionListByFilterExp(getMsc(), table, filterExp,

Review comment:
       I wonder if there is a way to retain `filterExp` in a more natural 
way....it will be kryo-encoded almost all the time...but seems like the 
metastore interface method was designed to accept kryo stuff...

##########
File path: 
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java
##########
@@ -330,17 +330,6 @@ public void testPartitionsCheck() throws HiveException,
     assertEquals(partToRemove.getTable().getTableName(),
         result.getPartitionsNotOnFs().iterator().next().getTableName());
     assertEquals(Collections.<CheckResult.PartitionResult>emptySet(), 
result.getPartitionsNotInMs());
-
-    List<Map<String, String>> partsCopy = new ArrayList<Map<String, String>>();
-    partsCopy.add(partitions.get(1).getSpec());

Review comment:
       is there a successor of this test?

##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
##########
@@ -734,6 +734,21 @@ dropPartitionOperator
     EQUAL | NOTEQUAL | LESSTHANOREQUALTO | LESSTHAN | GREATERTHANOREQUALTO | 
GREATERTHAN
     ;
 
+filterPartitionSpec
+    :
+    LPAREN filterPartitionVal (COMMA  filterPartitionVal )* RPAREN -> 
^(TOK_PARTSPEC filterPartitionVal +)
+    ;
+
+filterPartitionVal
+    :
+    identifier filterPartitionOperator constant -> ^(TOK_PARTVAL identifier 
filterPartitionOperator constant)
+    ;
+
+filterPartitionOperator
+    :
+    EQUAL | NOTEQUAL | LESSTHANOREQUALTO | LESSTHAN | GREATERTHANOREQUALTO | 
GREATERTHAN | KW_LIKE

Review comment:
       `dropPartitionSpec` seems to use almost the same construct ; I don't see 
any reason to duplicate it ...
   the only difference I see right now is `LIKE` - are there any other 
differences?
   
   I think instead of duplicate we should use the same stuff...

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##########
@@ -383,7 +375,29 @@ void findUnknownPartitions(Table table, Set<Path> 
partPaths,
     // now check the table folder and see if we find anything
     // that isn't in the metastore
     Set<Path> allPartDirs = new HashSet<Path>();
+    Set<Path> partDirs = new HashSet<Path>();
+    List<FieldSchema> partColumns = table.getPartitionKeys();
     checkPartitionDirs(tablePath, allPartDirs, 
Collections.unmodifiableList(getPartColNames(table)));
+
+    if (filterExp != null) {
+      PartitionExpressionProxy expressionProxy = createExpressionProxy(conf);
+      List<String> paritions = new ArrayList<>();
+      for (Path path : allPartDirs) {
+        // remove the table's path from the partition path
+        // eg: <tablePath>/p1=1/p2=2/p3=3 ---> p1=1/p2=2/p3=3
+        paritions.add(path.toString().substring(tablePath.toString().length() 
+ 1));

Review comment:
       I'm wondering if `tablePath` could end with a '/' or not; if it does, 
and `checkPartitionDirs` are removing double slashes this could eat up 1 extra 
char...

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##########
@@ -1348,6 +1348,17 @@ public static Path getPath(Table table) {
     }
   }
 
+  public static void getPartitionListByFilterExp(IMetaStoreClient msc, Table 
table, byte[] filterExp,
+                                                 String defaultPartName, 
List<Partition> results)
+      throws MetastoreException {
+    try {
+      msc.listPartitionsByExpr(table.getCatName(), table.getDbName(), 
table.getTableName(), filterExp,

Review comment:
       this method accepts `byte[]` and if I'm not wrong this is like this 
since around 2013 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to