[jira] [Work logged] (HIVE-24855) Introduce virtual colum ROWISDELETED

ASF GitHub Bot (Jira) Mon, 22 Mar 2021 15:20:04 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24855?focusedWorklogId=570100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570100
 ]


ASF GitHub Bot logged work on HIVE-24855:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Mar/21 22:19
            Start Date: 22/Mar/21 22:19
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on a change in pull request #2046:
URL: https://github.com/apache/hive/pull/2046#discussion_r599094572



##########
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapInputFormat.java
##########
@@ -182,11 +190,11 @@ static VectorizedRowBatchCtx createFakeVrbCtx(MapWork 
mapWork) throws HiveExcept
     RowSchema rowSchema = findTsOp(mapWork).getSchema();
     final List<String> colNames = new 
ArrayList<String>(rowSchema.getSignature().size());
     final List<TypeInfo> colTypes = new 
ArrayList<TypeInfo>(rowSchema.getSignature().size());
-    boolean hasRowId = false;
+    ArrayList<VirtualColumn> virtualColumnList = new ArrayList<>(2);
     for (ColumnInfo c : rowSchema.getSignature()) {
       String columnName = c.getInternalName();
-      if (VirtualColumn.ROWID.getName().equals(columnName)) {
-        hasRowId = true;
+      if (ALLOWED_VIRTUAL_COLUMNS.containsKey(columnName)) {
+        virtualColumnList.add(ALLOWED_VIRTUAL_COLUMNS.get(columnName));
       } else {
         if (VirtualColumn.VIRTUAL_COLUMN_NAMES.contains(columnName)) continue;

Review comment:
       Not really part of this patch, but this line can be merge with previous 
one.

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -4450,6 +4450,8 @@ private static void populateLlapDaemonVarsSet(Set<String> 
llapDaemonVarsSetLocal
     HIVE_ACID_DIRECT_INSERT_ENABLED("hive.acid.direct.insert.enabled", true,
         "Enable writing the data files directly to the table's final 
destination instead of the staging directory."
         + "This optimization only applies on INSERT operations on ACID 
tables."),
+    HIVE_ACID_FETCH_DELETED_ROWS("hive.acid.fetch.deleted.rows", false,

Review comment:
       I am not sure this should be part of HiveConf.java since it is not a 
global/session config per se? Have you considered introducing it in 
`org.apache.hadoop.hive.conf.Constants`? We have introduced some of these 
config variables there in the past.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -1970,8 +1970,22 @@ private int processQueryHint(ASTNode ast, QBParseInfo 
qbp, int posn) throws Sema
     String queryHintStr = ast.getText();
     LOG.debug("QUERY HINT: {} ", queryHintStr);
     try {
-      ASTNode hintNode = pd.parseHint(queryHintStr);
-      qbp.setHints(hintNode);
+      ASTNode hintListNode = pd.parseHint(queryHintStr);
+      qbp.setHints(hintListNode);
+      for (int i = 0; i < hintListNode.getChildCount(); ++i) {
+        ASTNode hintNode = (ASTNode) hintListNode.getChild(i);
+        if (hintNode.getChild(0).getType() != 
HintParser.TOK_FETCH_DELETED_ROWS) {

Review comment:
       Should this be a table property rather than a hint, i.e., value for 
`hive.acid.fetch.deleted.rows` passed as a property for the table scan? That 
would use a similar pattern to what we do in other code paths in Hive, e.g., to 
push computation for Druid/JDBC. It also means we do not need to introduce a 
new hint so it will simplify the patch.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -1943,15 +1943,17 @@ public static boolean isFullAcidScan(Configuration 
conf) {
    *                   we assume this is a full transactional table.
    */
   public static void setAcidOperationalProperties(
-      Configuration conf, boolean isTxnTable, AcidOperationalProperties 
properties) {
+      Configuration conf, boolean isTxnTable, AcidOperationalProperties 
properties, boolean fetchDeletedRows) {

Review comment:
       Since in most cases, we would pass false for the last parameter, should 
we keep the old method signature too, which would call this method with `false` 
value for the last parameter?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/BatchToRowReader.java
##########
@@ -108,25 +108,41 @@ public BatchToRowReader(RecordReader<NullWritable, 
VectorizedRowBatch> vrbReader
     } else {
       Arrays.fill(included, true);
     }
-    // Create struct for ROW__ID virtual column and extract index
-    this.rowIdIdx = vrbCtx.findVirtualColumnNum(VirtualColumn.ROWID);
-    if (this.rowIdIdx >= 0) {
-      included[rowIdIdx] = true;
+
+    virtualColumnHandlers = requestedVirtualColumns();
+    for (VirtualColumnHandler handler : virtualColumnHandlers) {
+      int idx = vrbCtx.findVirtualColumnNum(handler.virtualColumn);
+      if (idx >= 0) {
+        included[idx] = true;
+        handler.indexInSchema = idx;
+      }
     }
+
     if (LOG.isDebugEnabled()) {
       LOG.debug("Including the columns " + DebugUtils.toString(included));
     }
     this.included = included;
   }
 
+  public static class VirtualColumnHandler {

Review comment:
       Can you add a short comment for the inner class?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 570100)
    Time Spent: 20m  (was: 10m)

> Introduce virtual colum ROW__IS__DELETED
> ----------------------------------------
>
>                 Key: HIVE-24855
>                 URL: https://issues.apache.org/jira/browse/HIVE-24855
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24855) Introduce virtual colum ROW__IS__DELETED

Reply via email to

[jira] [Work logged] (HIVE-24855) Introduce virtual colum ROWISDELETED