[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269879748 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: Done. Instead of last.repl.id, I am explicitly checking if the property is related to stats and then set isTxn only in case of replication. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269874353 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && +addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = Review comment: Ok. Underneath this code is using the valid write id list created using open transaction list of repl. So, this isn't wrong. But this may change subject to the changes because of other comment you have. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269872622 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java ## @@ -84,6 +86,73 @@ public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, long return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, minOpenTxnId); } + /** + * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a + * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to + * read the files, and thus treats both open and aborted transactions as invalid. + * + * This API is used by Hive replication which may have multiple transactions open at a time. + * + * @param txns open txn list from the metastore + * @param currentTxns Current transactions that the replication has opened. If any of the + *transactions is greater than 0 it will be removed from the exceptions + *list so that the replication sees its own transaction as valid. + * @return a valid txn list. + */ + public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, Review comment: If there were multiple transactions on the source running concurrently at a time, there will be those many open transaction events in the dump which when replicated will have those many open transactions at a time on the target while replaying those events. So, there could be multiple open transactions on target even during repl load. The only link between CreateTableOperation#createTableReplaceMode and Hive#alterTable is EnvironmentContext, so would could use this to pass a flag to indicate the valid writeId list should be created using the given writeId. But we are using Environment context to pass information only to the metastore and not use it in-between. We could construct the valid writeId list in the metastore directly like what we are doing for create table and partition using that kind of flag. Does that look good? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269856891 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: AFAIU, the reason we set replWriteId in CreateTableDesc is it can be then passed everywhere CreateTableDesc is used. It's better not to create two paths for passing same writeId, with a risk of those going of sync with each other. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269523732 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: The comment // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. itself looks wrong. I do not see any ADDPROPS usage which is updating statistics properties. All those seem to come through AddPartition and not alterTable for partitioned table. So, may be we can safely mark this as non-transactional always. Does that look right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269523732 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: itself is wrong. I do not see any ADDPROPS usage which is updating transactional properties. All those seem to come through AddPartition and not alterTable for partitioned table. So, may be we can safely mark this as non-transactional always. Does that look right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269508934 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java ## @@ -118,7 +118,8 @@ List notNullConstraints; List defaultConstraints; List checkConstraints; - private ColumnStatistics colStats; + private ColumnStatistics colStats; // For the sake of replication + private long writeId = -1; // For the sake of replication Review comment: I was initially afraid that there could be other side-effects of this change. Your suggestion will bring all writeId replication through replWriteId, which is good. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269476213 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: In getBaseAddPartitionDescFromPartition() where we use this function, we don't have access to the event message. Instead we are passing the writeId through ImportTableDesc by calling setReplWriteId(). This function just introduces the missing getReplWriteId() method symmetric to setReplWriteId(). If we use local variable and pass it around there is a possibility that local writeId variable can go out of sync with that in ImportTableDesc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269468871 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269467769 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) { } // Update partition column statistics if available -for (Partition newPart : newParts) { - if (newPart.isSetColStats()) { -updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, newPart.getWriteId()); +int cnt = 0; +for (ColumnStatistics partColStats: partsColStats) { + long writeId = partsWriteIds.get(cnt++); + // On replica craft a valid snapshot out of the writeId in the partition + String validWriteIds = null; + if (writeId > 0) { +ValidWriteIdList vwil = Review comment: Done. Please check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269467699 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = + new ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(), Review comment: Done. Please check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269467626 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269463941 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final Table tbl, List checkConstraints) throws AlreadyExistsException, MetaException, InvalidObjectException, NoSuchObjectException, InvalidInputException { + + ColumnStatistics colStats = null; + // If the given table has column statistics, save it here. We will update it later. + // We don't want it to be part of the Table object being created, lest the create table Review comment: " and also shouldn't be persisted". That's not true. We will persist the table stats but later. If you let me know which part of the comment is complex (needs simplification), will come up with alternate wording reflecting the same meaning. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269452642 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java ## @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath, // Right now, we do not have a way of associating a writeId with statistics for a table // converted to a transactional table if it was non-transactional on the source. So, do not Review comment: Done. Looks like I missed pushing an entire commit fixing the comments. Done now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269425412 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java ## @@ -1247,17 +1244,37 @@ private static void createReplImportTasks( } else if (!replicationSpec.isMetadataOnly() && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) { x.getLOG().debug("adding dependent CopyWork/MoveWork for table"); -t.addDependentTask(loadTable(fromURI, table, replicationSpec.isReplace(), -new Path(tblDesc.getLocation()), replicationSpec, x, writeId, stmtId)); +dependentTasks = new ArrayList<>(1); +dependentTasks.add(loadTable(fromURI, table, replicationSpec.isReplace(), + new Path(tblDesc.getLocation()), replicationSpec, + x, writeId, stmtId)); } - if (dropTblTask != null) { -// Drop first and then create -dropTblTask.addDependentTask(t); -x.getTasks().add(dropTblTask); + // During replication, by the time we reply a commit transaction event, the table should + // have been already created when replaying previous events. So no need to create table + // again. For some reason we need create table task for partitioned table though. Review comment: Corrected. The partition case is already fixed, but the comment wasn't corrected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269424107 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, List newParts, new ArrayList(); try { AcidUtils.TableSnapshot tableSnapshot = null; + // TODO: In case of replication use the writeId and valid write id list constructed for Review comment: I have addressed this comment and removed it as well. But didn't commit the change and thus wasn't part of the PR. I have updated PR. This TODO is no more there. Sorry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269423978 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { +isTxn = false; + } else { +isTxn = true; + } +} +// TODO: Somehow we have to signal alterPartitions that it's part of replication and +// should use replication's valid writeid list instead of creating one. Review comment: I have addressed this comment and removed it as well. But didn't commit the change and thus wasn't part of the PR. I have updated PR. This TODO is no more there. Sorry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services