[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=446440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446440
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 11:44
Start Date: 16/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #587:
URL: https://github.com/apache/hive/pull/587


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446440)
Time Spent: 2.5h  (was: 2h 20m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=442975=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442975
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 09/Jun/20 16:17
Start Date: 09/Jun/20 16:17
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #587:
URL: https://github.com/apache/hive/pull/587#issuecomment-641143683


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442975)
Time Spent: 2h 20m  (was: 2h 10m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=361580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361580
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 20/Dec/19 10:35
Start Date: 20/Dec/19 10:35
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r360316128
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
 ##
 @@ -670,4 +678,63 @@ public void testMultiDBTxn() throws Throwable {
 replica.run("drop database " + dbName1 + " cascade");
 replica.run("drop database " + dbName2 + " cascade");
   }
+
+  private void runCompaction(String dbName, String tblName, CompactionType 
compactionType) throws Throwable {
+HiveConf hiveConf = new HiveConf(primary.getConf());
+TxnStore txnHandler = TxnUtils.getTxnStore(hiveConf);
+txnHandler.compact(new CompactionRequest(dbName, tblName, compactionType));
+hiveConf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, false);
+runWorker(hiveConf);
+runCleaner(hiveConf);
+  }
+
+  private FileStatus[] getDirsInTableLoc(WarehouseInstance wh, String db, 
String table) throws Throwable {
+Path tblLoc = new Path(wh.getTable(db, table).getSd().getLocation());
+FileSystem fs = tblLoc.getFileSystem(wh.getConf());
+return fs.listStatus(tblLoc, EximUtil.getDirectoryFilter(fs));
+  }
+
+  @Test
+  public void testAcidTablesBootstrapWithCompaction() throws Throwable {
+ String tableName = testName.getMethodName();
+ primary.run("use " + primaryDbName)
+.run("create table " + tableName + " (id int) clustered by(id) 
into 3 buckets stored as orc " +
+"tblproperties (\"transactional\"=\"true\")")
+.run("insert into " + tableName + " values(1)")
+.run("insert into " + tableName + " values(2)");
+runCompaction(primaryDbName, tableName, CompactionType.MAJOR);
 
 Review comment:
   Same as above.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361580)
Time Spent: 2h 10m  (was: 2h)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=361579=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361579
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 20/Dec/19 10:35
Start Date: 20/Dec/19 10:35
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r360316053
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
 ##
 @@ -670,4 +678,63 @@ public void testMultiDBTxn() throws Throwable {
 replica.run("drop database " + dbName1 + " cascade");
 replica.run("drop database " + dbName2 + " cascade");
   }
+
+  private void runCompaction(String dbName, String tblName, CompactionType 
compactionType) throws Throwable {
+HiveConf hiveConf = new HiveConf(primary.getConf());
+TxnStore txnHandler = TxnUtils.getTxnStore(hiveConf);
+txnHandler.compact(new CompactionRequest(dbName, tblName, compactionType));
+hiveConf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, false);
+runWorker(hiveConf);
+runCleaner(hiveConf);
+  }
+
+  private FileStatus[] getDirsInTableLoc(WarehouseInstance wh, String db, 
String table) throws Throwable {
+Path tblLoc = new Path(wh.getTable(db, table).getSd().getLocation());
+FileSystem fs = tblLoc.getFileSystem(wh.getConf());
+return fs.listStatus(tblLoc, EximUtil.getDirectoryFilter(fs));
+  }
+
+  @Test
+  public void testAcidTablesBootstrapWithCompaction() throws Throwable {
+ String tableName = testName.getMethodName();
+ primary.run("use " + primaryDbName)
+.run("create table " + tableName + " (id int) clustered by(id) 
into 3 buckets stored as orc " +
+"tblproperties (\"transactional\"=\"true\")")
 
 Review comment:
   In case the logic changes in future, we need a test to safe guard.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361579)
Time Spent: 2h  (was: 1h 50m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=361577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361577
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 20/Dec/19 10:34
Start Date: 20/Dec/19 10:34
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r360315778
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
 ##
 @@ -463,7 +464,29 @@ public static Path 
getCopyDestination(ReplChangeManager.FileInfo fileInfo, Path
 String[] subDirs = fileInfo.getSubDir().split(Path.SEPARATOR);
 Path destination = destRoot;
 for (String subDir: subDirs) {
-  destination = new Path(destination, subDir);
+  // If the directory is created by compactor, then the directory will 
have the transaction id also.
+  // In case of replication, the same txn id can not be used at target, as 
the txn with same id might be a
+  // aborted or live txn at target.
+  // In case of bootstrap load, we copy only the committed data, so the 
directory with only write id
+  // can be created. The validity txn id can be removed from the directory 
name.
+  // TODO : Support for incremental load flow. This can be done once 
replication of compaction is decided.
+  if (AcidUtils.getVisibilityTxnId(subDir) > 0) {
 
 Review comment:
   Thanks for the explanation.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361577)
Time Spent: 1h 40m  (was: 1.5h)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=361578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361578
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 20/Dec/19 10:35
Start Date: 20/Dec/19 10:35
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r360315864
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 ##
 @@ -1467,7 +1467,7 @@ public void 
replTableWriteIdState(ReplTblWriteIdStateRequest rqst) throws MetaEx
 
 // Schedule Major compaction on all the partitions/table to clean aborted 
data
 if (numAbortedWrites > 0) {
-  CompactionRequest compactRqst = new CompactionRequest(rqst.getDbName(), 
rqst.getTableName(),
+  CompactionRequest compactRqst = new CompactionRequest(dbName, tblName,
 
 Review comment:
   Ok.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361578)
Time Spent: 1h 50m  (was: 1h 40m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=361576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361576
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 20/Dec/19 10:34
Start Date: 20/Dec/19 10:34
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r360315681
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
 ##
 @@ -463,7 +464,29 @@ public static Path 
getCopyDestination(ReplChangeManager.FileInfo fileInfo, Path
 String[] subDirs = fileInfo.getSubDir().split(Path.SEPARATOR);
 Path destination = destRoot;
 for (String subDir: subDirs) {
-  destination = new Path(destination, subDir);
+  // If the directory is created by compactor, then the directory will 
have the transaction id also.
+  // In case of replication, the same txn id can not be used at target, as 
the txn with same id might be a
+  // aborted or live txn at target.
+  // In case of bootstrap load, we copy only the committed data, so the 
directory with only write id
+  // can be created. The validity txn id can be removed from the directory 
name.
+  // TODO : Support for incremental load flow. This can be done once 
replication of compaction is decided.
+  if (AcidUtils.getVisibilityTxnId(subDir) > 0) {
+if (subDir.startsWith(AcidUtils.BASE_PREFIX)) {
+  AcidUtils.ParsedBase pb = AcidUtils.ParsedBase.parseBase(new 
Path(subDir));
+  destination = new Path(destination, 
AcidUtils.baseDir(pb.getWriteId()));
+} else if (subDir.startsWith(AcidUtils.DELTA_PREFIX)) {
+  AcidUtils.ParsedDeltaLight pdl = 
AcidUtils.ParsedDeltaLight.parse(new Path(subDir));
+  destination = new Path(destination, 
AcidUtils.deltaSubdir(pdl.getMinWriteId(), pdl.getMaxWriteId()));
+} else if (subDir.startsWith(AcidUtils.DELETE_DELTA_PREFIX)) {
+  AcidUtils.ParsedDeltaLight pdl = 
AcidUtils.ParsedDeltaLight.parse(new Path(subDir));
+  destination = new Path(destination, 
AcidUtils.deleteDeltaSubdir(pdl.getMinWriteId(), pdl.getMaxWriteId()));
 
 Review comment:
   Hmm, when the statement ids are actually used, this might become a problem. 
For now this looks ok.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361576)
Time Spent: 1.5h  (was: 1h 20m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=361574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361574
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 20/Dec/19 10:33
Start Date: 20/Dec/19 10:33
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r360315217
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
 ##
 @@ -463,7 +464,29 @@ public static Path 
getCopyDestination(ReplChangeManager.FileInfo fileInfo, Path
 String[] subDirs = fileInfo.getSubDir().split(Path.SEPARATOR);
 Path destination = destRoot;
 for (String subDir: subDirs) {
-  destination = new Path(destination, subDir);
+  // If the directory is created by compactor, then the directory will 
have the transaction id also.
+  // In case of replication, the same txn id can not be used at target, as 
the txn with same id might be a
+  // aborted or live txn at target.
+  // In case of bootstrap load, we copy only the committed data, so the 
directory with only write id
+  // can be created. The validity txn id can be removed from the directory 
name.
+  // TODO : Support for incremental load flow. This can be done once 
replication of compaction is decided.
 
 Review comment:
   The real question is whether there's a bug here, if compaction happens on 
the source as well as on the target. It would be better to clarify that this 
login needs to be worked again when incremental replication will be worked 
again.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361574)
Time Spent: 1h 10m  (was: 1h)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=361575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361575
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 20/Dec/19 10:33
Start Date: 20/Dec/19 10:33
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r360315411
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
 ##
 @@ -463,7 +464,29 @@ public static Path 
getCopyDestination(ReplChangeManager.FileInfo fileInfo, Path
 String[] subDirs = fileInfo.getSubDir().split(Path.SEPARATOR);
 Path destination = destRoot;
 for (String subDir: subDirs) {
-  destination = new Path(destination, subDir);
+  // If the directory is created by compactor, then the directory will 
have the transaction id also.
+  // In case of replication, the same txn id can not be used at target, as 
the txn with same id might be a
+  // aborted or live txn at target.
+  // In case of bootstrap load, we copy only the committed data, so the 
directory with only write id
+  // can be created. The validity txn id can be removed from the directory 
name.
+  // TODO : Support for incremental load flow. This can be done once 
replication of compaction is decided.
+  if (AcidUtils.getVisibilityTxnId(subDir) > 0) {
+if (subDir.startsWith(AcidUtils.BASE_PREFIX)) {
+  AcidUtils.ParsedBase pb = AcidUtils.ParsedBase.parseBase(new 
Path(subDir));
+  destination = new Path(destination, 
AcidUtils.baseDir(pb.getWriteId()));
 
 Review comment:
   Thanks for the explanation.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361575)
Time Spent: 1h 20m  (was: 1h 10m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=354254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354254
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 05/Dec/19 12:36
Start Date: 05/Dec/19 12:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r354280694
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
 ##
 @@ -463,7 +464,29 @@ public static Path 
getCopyDestination(ReplChangeManager.FileInfo fileInfo, Path
 String[] subDirs = fileInfo.getSubDir().split(Path.SEPARATOR);
 Path destination = destRoot;
 for (String subDir: subDirs) {
-  destination = new Path(destination, subDir);
+  // If the directory is created by compactor, then the directory will 
have the transaction id also.
+  // In case of replication, the same txn id can not be used at target, as 
the txn with same id might be a
+  // aborted or live txn at target.
+  // In case of bootstrap load, we copy only the committed data, so the 
directory with only write id
+  // can be created. The validity txn id can be removed from the directory 
name.
+  // TODO : Support for incremental load flow. This can be done once 
replication of compaction is decided.
+  if (AcidUtils.getVisibilityTxnId(subDir) > 0) {
 
 Review comment:
   What does it mean to have visibility txn id to be less than 0?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354254)
Time Spent: 40m  (was: 0.5h)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=354258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354258
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 05/Dec/19 12:36
Start Date: 05/Dec/19 12:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r354286755
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
 ##
 @@ -670,4 +678,63 @@ public void testMultiDBTxn() throws Throwable {
 replica.run("drop database " + dbName1 + " cascade");
 replica.run("drop database " + dbName2 + " cascade");
   }
+
+  private void runCompaction(String dbName, String tblName, CompactionType 
compactionType) throws Throwable {
+HiveConf hiveConf = new HiveConf(primary.getConf());
+TxnStore txnHandler = TxnUtils.getTxnStore(hiveConf);
+txnHandler.compact(new CompactionRequest(dbName, tblName, compactionType));
+hiveConf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, false);
+runWorker(hiveConf);
+runCleaner(hiveConf);
+  }
+
+  private FileStatus[] getDirsInTableLoc(WarehouseInstance wh, String db, 
String table) throws Throwable {
+Path tblLoc = new Path(wh.getTable(db, table).getSd().getLocation());
+FileSystem fs = tblLoc.getFileSystem(wh.getConf());
+return fs.listStatus(tblLoc, EximUtil.getDirectoryFilter(fs));
+  }
+
+  @Test
+  public void testAcidTablesBootstrapWithCompaction() throws Throwable {
+ String tableName = testName.getMethodName();
+ primary.run("use " + primaryDbName)
+.run("create table " + tableName + " (id int) clustered by(id) 
into 3 buckets stored as orc " +
+"tblproperties (\"transactional\"=\"true\")")
+.run("insert into " + tableName + " values(1)")
+.run("insert into " + tableName + " values(2)");
+runCompaction(primaryDbName, tableName, CompactionType.MAJOR);
 
 Review comment:
   What about a minor compaction?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354258)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=354255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354255
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 05/Dec/19 12:36
Start Date: 05/Dec/19 12:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r354281539
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 ##
 @@ -1467,7 +1467,7 @@ public void 
replTableWriteIdState(ReplTblWriteIdStateRequest rqst) throws MetaEx
 
 // Schedule Major compaction on all the partitions/table to clean aborted 
data
 if (numAbortedWrites > 0) {
-  CompactionRequest compactRqst = new CompactionRequest(rqst.getDbName(), 
rqst.getTableName(),
+  CompactionRequest compactRqst = new CompactionRequest(dbName, tblName,
 
 Review comment:
   Why do we need this change here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354255)
Time Spent: 50m  (was: 40m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=354252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354252
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 05/Dec/19 12:36
Start Date: 05/Dec/19 12:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r354277500
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
 ##
 @@ -463,7 +464,29 @@ public static Path 
getCopyDestination(ReplChangeManager.FileInfo fileInfo, Path
 String[] subDirs = fileInfo.getSubDir().split(Path.SEPARATOR);
 Path destination = destRoot;
 for (String subDir: subDirs) {
-  destination = new Path(destination, subDir);
+  // If the directory is created by compactor, then the directory will 
have the transaction id also.
+  // In case of replication, the same txn id can not be used at target, as 
the txn with same id might be a
+  // aborted or live txn at target.
+  // In case of bootstrap load, we copy only the committed data, so the 
directory with only write id
+  // can be created. The validity txn id can be removed from the directory 
name.
+  // TODO : Support for incremental load flow. This can be done once 
replication of compaction is decided.
+  if (AcidUtils.getVisibilityTxnId(subDir) > 0) {
+if (subDir.startsWith(AcidUtils.BASE_PREFIX)) {
+  AcidUtils.ParsedBase pb = AcidUtils.ParsedBase.parseBase(new 
Path(subDir));
+  destination = new Path(destination, 
AcidUtils.baseDir(pb.getWriteId()));
 
 Review comment:
   Looks like, we are removing the transaction id from the compactor and just 
leaving writeId there. Are we replicating the corresponding transaction? How do 
we know whether this writeId is visible. I am assuming that a failed compaction 
will leave a directory behind but its writeId won't be visible.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354252)
Time Spent: 0.5h  (was: 20m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=354256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354256
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 05/Dec/19 12:36
Start Date: 05/Dec/19 12:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r354286299
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
 ##
 @@ -670,4 +678,63 @@ public void testMultiDBTxn() throws Throwable {
 replica.run("drop database " + dbName1 + " cascade");
 replica.run("drop database " + dbName2 + " cascade");
   }
+
+  private void runCompaction(String dbName, String tblName, CompactionType 
compactionType) throws Throwable {
+HiveConf hiveConf = new HiveConf(primary.getConf());
+TxnStore txnHandler = TxnUtils.getTxnStore(hiveConf);
+txnHandler.compact(new CompactionRequest(dbName, tblName, compactionType));
+hiveConf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, false);
+runWorker(hiveConf);
+runCleaner(hiveConf);
+  }
+
+  private FileStatus[] getDirsInTableLoc(WarehouseInstance wh, String db, 
String table) throws Throwable {
+Path tblLoc = new Path(wh.getTable(db, table).getSd().getLocation());
+FileSystem fs = tblLoc.getFileSystem(wh.getConf());
+return fs.listStatus(tblLoc, EximUtil.getDirectoryFilter(fs));
+  }
+
+  @Test
+  public void testAcidTablesBootstrapWithCompaction() throws Throwable {
+ String tableName = testName.getMethodName();
+ primary.run("use " + primaryDbName)
+.run("create table " + tableName + " (id int) clustered by(id) 
into 3 buckets stored as orc " +
+"tblproperties (\"transactional\"=\"true\")")
+.run("insert into " + tableName + " values(1)")
+.run("insert into " + tableName + " values(2)");
+runCompaction(primaryDbName, tableName, CompactionType.MAJOR);
+WarehouseInstance.Tuple bootstrapDump = primary.dump(primaryDbName, null);
+replica.load(replicatedDbName, bootstrapDump.dumpLocation);
+replica.run("use " + replicatedDbName)
+.run("show tables")
+.verifyResults(new String[] {tableName})
+.run("repl status " + replicatedDbName)
+.verifyResult(bootstrapDump.lastReplicationId)
+.run("select id from " + tableName + " order by id")
+.verifyResults(new String[]{"1", "2"});
+
+FileStatus[] dirsInLoadPath = getDirsInTableLoc(primary, primaryDbName, 
tableName);
+long writeId = -1;
+for (FileStatus fileStatus : dirsInLoadPath) {
+  if (fileStatus.getPath().getName().startsWith(AcidUtils.BASE_PREFIX)) {
+writeId = 
AcidUtils.ParsedBase.parseBase(fileStatus.getPath()).getWriteId();
+
assertTrue(AcidUtils.getVisibilityTxnId(fileStatus.getPath().getName()) != -1);
+break;
+  }
+}
+//compaction is done so there should be a base directory.
+assertTrue(writeId != -1);
+
+dirsInLoadPath = getDirsInTableLoc(replica, replicatedDbName, tableName);
+for (FileStatus fileStatus : dirsInLoadPath) {
+  if (fileStatus.getPath().getName().startsWith(AcidUtils.BASE_PREFIX)) {
+assertTrue(writeId == 
AcidUtils.ParsedBase.parseBase(fileStatus.getPath()).getWriteId());
+
assertTrue(AcidUtils.getVisibilityTxnId(fileStatus.getPath().getName()) == -1);
+writeId = -1;
+break;
+  }
+}
+//make sure that it has done the verification.
+assertTrue(writeId == -1);
 
 Review comment:
   Using writeId again for verification saves a variable but it's not so much 
readable. May be you want to save writeId on the replica in a separate variable 
and compare the writeId from source that on the target.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354256)
Time Spent: 1h  (was: 50m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar 

[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=354253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354253
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 05/Dec/19 12:36
Start Date: 05/Dec/19 12:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r354280208
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
 ##
 @@ -463,7 +464,29 @@ public static Path 
getCopyDestination(ReplChangeManager.FileInfo fileInfo, Path
 String[] subDirs = fileInfo.getSubDir().split(Path.SEPARATOR);
 Path destination = destRoot;
 for (String subDir: subDirs) {
-  destination = new Path(destination, subDir);
+  // If the directory is created by compactor, then the directory will 
have the transaction id also.
+  // In case of replication, the same txn id can not be used at target, as 
the txn with same id might be a
+  // aborted or live txn at target.
+  // In case of bootstrap load, we copy only the committed data, so the 
directory with only write id
+  // can be created. The validity txn id can be removed from the directory 
name.
+  // TODO : Support for incremental load flow. This can be done once 
replication of compaction is decided.
+  if (AcidUtils.getVisibilityTxnId(subDir) > 0) {
+if (subDir.startsWith(AcidUtils.BASE_PREFIX)) {
+  AcidUtils.ParsedBase pb = AcidUtils.ParsedBase.parseBase(new 
Path(subDir));
+  destination = new Path(destination, 
AcidUtils.baseDir(pb.getWriteId()));
+} else if (subDir.startsWith(AcidUtils.DELTA_PREFIX)) {
+  AcidUtils.ParsedDeltaLight pdl = 
AcidUtils.ParsedDeltaLight.parse(new Path(subDir));
+  destination = new Path(destination, 
AcidUtils.deltaSubdir(pdl.getMinWriteId(), pdl.getMaxWriteId()));
+} else if (subDir.startsWith(AcidUtils.DELETE_DELTA_PREFIX)) {
+  AcidUtils.ParsedDeltaLight pdl = 
AcidUtils.ParsedDeltaLight.parse(new Path(subDir));
+  destination = new Path(destination, 
AcidUtils.deleteDeltaSubdir(pdl.getMinWriteId(), pdl.getMaxWriteId()));
 
 Review comment:
   We are ignoring the statement id, I think because it's -1 always. Is that 
future proof?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354253)
Time Spent: 40m  (was: 0.5h)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=354251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354251
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 05/Dec/19 12:36
Start Date: 05/Dec/19 12:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r354223809
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
 ##
 @@ -463,7 +464,29 @@ public static Path 
getCopyDestination(ReplChangeManager.FileInfo fileInfo, Path
 String[] subDirs = fileInfo.getSubDir().split(Path.SEPARATOR);
 Path destination = destRoot;
 for (String subDir: subDirs) {
-  destination = new Path(destination, subDir);
+  // If the directory is created by compactor, then the directory will 
have the transaction id also.
+  // In case of replication, the same txn id can not be used at target, as 
the txn with same id might be a
+  // aborted or live txn at target.
+  // In case of bootstrap load, we copy only the committed data, so the 
directory with only write id
+  // can be created. The validity txn id can be removed from the directory 
name.
+  // TODO : Support for incremental load flow. This can be done once 
replication of compaction is decided.
 
 Review comment:
   Is this TODO resolved? If yes, please remove it. If it needs to be resolved, 
may be we should resolve it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354251)
Time Spent: 20m  (was: 10m)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-12-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=354257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354257
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 05/Dec/19 12:36
Start Date: 05/Dec/19 12:36
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #587: 
HIVE-21213 : Acid table bootstrap replication needs to handle directory created 
by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r354286550
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
 ##
 @@ -670,4 +678,63 @@ public void testMultiDBTxn() throws Throwable {
 replica.run("drop database " + dbName1 + " cascade");
 replica.run("drop database " + dbName2 + " cascade");
   }
+
+  private void runCompaction(String dbName, String tblName, CompactionType 
compactionType) throws Throwable {
+HiveConf hiveConf = new HiveConf(primary.getConf());
+TxnStore txnHandler = TxnUtils.getTxnStore(hiveConf);
+txnHandler.compact(new CompactionRequest(dbName, tblName, compactionType));
+hiveConf.setBoolVar(HiveConf.ConfVars.COMPACTOR_CRUD_QUERY_BASED, false);
+runWorker(hiveConf);
+runCleaner(hiveConf);
+  }
+
+  private FileStatus[] getDirsInTableLoc(WarehouseInstance wh, String db, 
String table) throws Throwable {
+Path tblLoc = new Path(wh.getTable(db, table).getSd().getLocation());
+FileSystem fs = tblLoc.getFileSystem(wh.getConf());
+return fs.listStatus(tblLoc, EximUtil.getDirectoryFilter(fs));
+  }
+
+  @Test
+  public void testAcidTablesBootstrapWithCompaction() throws Throwable {
+ String tableName = testName.getMethodName();
+ primary.run("use " + primaryDbName)
+.run("create table " + tableName + " (id int) clustered by(id) 
into 3 buckets stored as orc " +
+"tblproperties (\"transactional\"=\"true\")")
 
 Review comment:
   Should we add a test for partitioned table as well?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 354257)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=221288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221288
 ]

ASF GitHub Bot logged work on HIVE-21213:
-

Author: ASF GitHub Bot
Created on: 01/Apr/19 13:29
Start Date: 01/Apr/19 13:29
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #587: HIVE-21213 : 
Acid table bootstrap replication needs to handle directory created by 
compaction with txn id
URL: https://github.com/apache/hive/pull/587
 
 
   …
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221288)
Time Spent: 10m
Remaining Estimate: 0h

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21213.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)