veghlaci05 commented on code in PR #4952:
URL: https://github.com/apache/hive/pull/4952#discussion_r1431236510


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilderForInsertOnly.java:
##########
@@ -0,0 +1,313 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.hive.common.StatsSetupConst;
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.Order;
+import org.apache.hadoop.hive.metastore.api.SerDeInfo;
+import org.apache.hadoop.hive.metastore.api.SkewedInfo;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.hadoop.hive.ql.exec.DDLPlanUtils;
+import org.apache.hadoop.hive.ql.util.DirectionUtils;
+import org.apache.hive.common.util.HiveStringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Builds query strings that help with query-based compaction of insert-only 
tables.
+ */
+class CompactionQueryBuilderForInsertOnly extends CompactionQueryBuilder {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CompactionQueryBuilderForInsertOnly.class.getName());

Review Comment:
   not used



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilderForMajor.java:
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hive.common.util.HiveStringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Builds query strings that help with query-based MAJOR compaction of CRUD.
+ */
+class CompactionQueryBuilderForMajor extends CompactionQueryBuilder {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CompactionQueryBuilderForMajor.class.getName());
+
+  /**
+   * Construct a CompactionQueryBuilderForMajor with required params.
+   *
+   * @param compactionType major or minor or rebalance, e.g. 
CompactionType.MAJOR.
+   *                       Cannot be null.
+   * @param operation query's Operation e.g. Operation.CREATE.
+   * @throws IllegalArgumentException if compactionType is null
+   */
+  CompactionQueryBuilderForMajor(CompactionType compactionType, Operation 
operation, String resultTableName) {
+    super(compactionType, operation, false, resultTableName);
+  }

Review Comment:
   `CompactionType.Major` could be passed directly into super call (hardcoded) 
instead of taking it as a parameter. Passing other values can cause this class 
to output incorrect SQL. The same applies for the other implementations.



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilderForInsertOnly.java:
##########
@@ -0,0 +1,313 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.hive.common.StatsSetupConst;
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.Order;
+import org.apache.hadoop.hive.metastore.api.SerDeInfo;
+import org.apache.hadoop.hive.metastore.api.SkewedInfo;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
+import org.apache.hadoop.hive.ql.exec.DDLPlanUtils;
+import org.apache.hadoop.hive.ql.util.DirectionUtils;
+import org.apache.hive.common.util.HiveStringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Builds query strings that help with query-based compaction of insert-only 
tables.
+ */
+class CompactionQueryBuilderForInsertOnly extends CompactionQueryBuilder {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CompactionQueryBuilderForInsertOnly.class.getName());
+
+  private StorageDescriptor storageDescriptor; // for Create in insert-only
+
+  /**
+   * Construct a CompactionQueryBuilderForInsertOnly with required params.
+   *
+   * @param compactionType major or minor or rebalance, e.g. 
CompactionType.MAJOR.
+   *                       Cannot be null.
+   * @param operation      query's Operation e.g. Operation.CREATE.
+   * @throws IllegalArgumentException if compactionType is null or the 
compaction type is REBALANCE
+   */
+  CompactionQueryBuilderForInsertOnly(CompactionType compactionType, Operation 
operation, String resultTableName) {
+    super(compactionType, operation, true, resultTableName);
+    if (CompactionType.REBALANCE.equals(compactionType)) {
+      throw new IllegalArgumentException("Rebalance compaction is supported 
only on full ACID tables!");
+    }
+  }
+
+  /**
+   * Set the StorageDescriptor of the table or partition to compact.
+   * Required for Create operations in insert-only compaction.
+   *
+   * @param storageDescriptor StorageDescriptor of the table or partition to 
compact, not null
+   */
+  CompactionQueryBuilder setStorageDescriptor(StorageDescriptor 
storageDescriptor) {
+    this.storageDescriptor = storageDescriptor;
+    return this;
+  }
+
+  protected void buildSelectClauseForInsert(StringBuilder query) {
+    // Need list of columns for major crud, mmmajor partitioned, mmminor
+    List<FieldSchema> cols;
+    if (CompactionType.MAJOR.equals(compactionType) && sourcePartition != null 
|| CompactionType.MINOR.equals(
+        compactionType)) {
+      if (sourceTab == null) {
+        return; // avoid NPEs, don't throw an exception but skip this part of 
the query
+      }
+      cols = sourceTab.getSd().getCols();
+    } else {
+      cols = null;
+    }
+    switch (compactionType) {
+    case MAJOR: {
+      if (sourcePartition != null) { //mmmajor and partitioned
+        appendColumns(query, cols, false);
+      } else { // mmmajor and unpartitioned
+        query.append("*");
+      }
+      break;
+    }
+    case MINOR: {
+      appendColumns(query, cols, false);
+    }
+    }
+  }
+
+  protected void getSourceForInsert(StringBuilder query) {
+    if (sourceTabForInsert != null) {
+      query.append(sourceTabForInsert);
+    } else {
+      
query.append(sourceTab.getDbName()).append(".").append(sourceTab.getTableName());
+    }
+    query.append(" ");
+    if (CompactionType.MAJOR.equals(compactionType) && 
StringUtils.isNotBlank(orderByClause)) {
+      query.append(orderByClause);
+    }
+  }
+
+  protected void buildWhereClauseForInsert(StringBuilder query) {
+    if (CompactionType.MAJOR.equals(compactionType) && sourcePartition != null 
&& sourceTab != null) {
+      List<String> vals = sourcePartition.getValues();
+      List<FieldSchema> keys = sourceTab.getPartitionKeys();
+      if (keys.size() != vals.size()) {
+        throw new IllegalStateException("source partition values (" + 
Arrays.toString(
+            vals.toArray()) + ") do not match source table values (" + 
Arrays.toString(
+            keys.toArray()) + "). Failing compaction.");
+      }
+
+      query.append(" where ");
+      for (int i = 0; i < keys.size(); ++i) {
+        FieldSchema keySchema = keys.get(i);
+        query.append(i == 0 ? "`" : " and 
`").append(keySchema.getName()).append("`=");
+        if 
(!keySchema.getType().equalsIgnoreCase(ColumnType.BOOLEAN_TYPE_NAME)) {
+          query.append("'").append(vals.get(i)).append("'");
+        } else {
+          query.append(vals.get(i));
+        }
+      }
+    }
+  }
+
+  protected void getDdlForCreate(StringBuilder query) {
+    defineColumns(query);
+
+    // PARTITIONED BY. Used for parts of minor compaction.
+    if (isPartitioned) {
+      query.append(" PARTITIONED BY (`file_name` STRING) ");
+    }
+
+    // CLUSTERED BY. (bucketing)
+    getMmBucketing(query);
+
+    // SKEWED BY
+    getSkewedByClause(query);
+
+    // STORED AS / ROW FORMAT SERDE + INPUTFORMAT + OUTPUTFORMAT
+    copySerdeFromSourceTable(query);
+
+    // LOCATION
+    if (location != null) {
+      query.append(" LOCATION 
'").append(HiveStringUtils.escapeHiveCommand(location)).append("'");
+    }
+
+    // TBLPROPERTIES
+    addTblProperties(query);
+  }
+
+  /**
+   * Define columns of the create query.
+   */
+  private void defineColumns(StringBuilder query) {
+    if (sourceTab == null) {
+      return; // avoid NPEs, don't throw an exception but skip this part of 
the query
+    }
+    query.append("(");
+    List<String> columnDescs = getColumnDescs();
+    query.append(StringUtils.join(columnDescs, ','));
+    query.append(") ");
+  }
+
+  /**
+   * Part of Create operation. Copy source table bucketing for insert-only 
compaction.
+   */
+  private void getMmBucketing(StringBuilder query) {
+    if (sourceTab == null) {
+      return; // avoid NPEs, don't throw an exception but skip this part of 
the query
+    }
+    boolean isFirst;
+    List<String> buckCols = sourceTab.getSd().getBucketCols();
+    if (buckCols.size() > 0) {
+      query.append("CLUSTERED BY (").append(StringUtils.join(buckCols, 
",")).append(") ");
+      List<Order> sortCols = sourceTab.getSd().getSortCols();
+      if (sortCols.size() > 0) {

Review Comment:
   Maybe `!buckCols.isEmpty()`



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilderForMajor.java:
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hive.common.util.HiveStringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Builds query strings that help with query-based MAJOR compaction of CRUD.
+ */
+class CompactionQueryBuilderForMajor extends CompactionQueryBuilder {

Review Comment:
   getDdlForCreate(), defineColumns(), addTblProperties() are exactly the same 
as in the Rebalance implementation. You may check if the duplications can be 
removed without over engineering it.



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilderForRebalance.java:
##########
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hive.common.util.HiveStringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.*;
+
+/**
+ * Builds query strings that help with REBALANCE compaction of CRUD.
+ */
+class CompactionQueryBuilderForRebalance extends CompactionQueryBuilder {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CompactionQueryBuilderForRebalance.class.getName());

Review Comment:
   Not used



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactionQueryBuilderForMajor.java:
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.hive.metastore.ColumnType;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hive.common.util.HiveStringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Builds query strings that help with query-based MAJOR compaction of CRUD.
+ */
+class CompactionQueryBuilderForMajor extends CompactionQueryBuilder {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CompactionQueryBuilderForMajor.class.getName());

Review Comment:
   Not used



##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java:
##########
@@ -72,10 +72,9 @@ public boolean run(CompactorContext context) throws 
IOException {
    * See {@link 
org.apache.hadoop.hive.conf.HiveConf.ConfVars#SPLIT_GROUPING_MODE} for the 
config description.
    */
   private List<String> getCreateQueries(String fullName, Table t, String 
tmpTableLocation) {
-    return Lists.newArrayList(new CompactionQueryBuilder(
+    return Lists.newArrayList(new CompactionQueryBuilderForMajor(

Review Comment:
   You may create a factory class for returning the specific 
`CompactionQueryBuilder` implementations. Using this approach have some benefit:
   - the decisive logic can be hidden in the factory method
   - the implementation classes can be hidden from the compactors (the factory 
returns the base class), especially if you move the builders into a 
sub-package. better decoupling between compactors and query builders.
   - the `getCreateQueries(), getCompactionQueries(), getDropQueries() `method 
can be moved to `QueryCompactor` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to