[jira] [Work logged] (HIVE-23235) Checkpointing in repl dump failing for orc format

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23235?focusedWorklogId=453167&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453167
 ]

ASF GitHub Bot logged work on HIVE-23235:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 00:31
Start Date: 01/Jul/20 00:31
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #987:
URL: https://github.com/apache/hive/pull/987


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453167)
Time Spent: 50m  (was: 40m)

> Checkpointing in repl dump failing for orc format
> -
>
> Key: HIVE-23235
> URL: https://issues.apache.org/jira/browse/HIVE-23235
> Project: Hive
>  Issue Type: Bug
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23235.01.patch, HIVE-23235.02.patch, 
> HIVE-23235.03.patch, HIVE-23235.04.patch, HIVE-23235.05.patch, 
> HIVE-23235.06.patch, HIVE-23235.07.patch, HIVE-23235.08.patch, 
> HIVE-23235.09.patch, HIVE-23235.10.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23235) Checkpointing in repl dump failing for orc format

2020-06-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23235?focusedWorklogId=450101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450101
 ]

ASF GitHub Bot logged work on HIVE-23235:
-

Author: ASF GitHub Bot
Created on: 24/Jun/20 00:25
Start Date: 24/Jun/20 00:25
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #987:
URL: https://github.com/apache/hive/pull/987#issuecomment-648503920


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450101)
Time Spent: 40m  (was: 0.5h)

> Checkpointing in repl dump failing for orc format
> -
>
> Key: HIVE-23235
> URL: https://issues.apache.org/jira/browse/HIVE-23235
> Project: Hive
>  Issue Type: Bug
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23235.01.patch, HIVE-23235.02.patch, 
> HIVE-23235.03.patch, HIVE-23235.04.patch, HIVE-23235.05.patch, 
> HIVE-23235.06.patch, HIVE-23235.07.patch, HIVE-23235.08.patch, 
> HIVE-23235.09.patch, HIVE-23235.10.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23235) Checkpointing in repl dump failing for orc format

2020-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23235?focusedWorklogId=425588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-425588
 ]

ASF GitHub Bot logged work on HIVE-23235:
-

Author: ASF GitHub Bot
Created on: 21/Apr/20 07:13
Start Date: 21/Apr/20 07:13
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #987:
URL: https://github.com/apache/hive/pull/987#discussion_r411930184



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -383,10 +385,17 @@ private void doRegularCopyOnce(FileSystem sourceFs, 
List srcList,
 throw new IOException(e);
   }
 } else {
+  deleteSubDirs(destinationFs, destination);
   FileUtil.copy(sourceFs, paths, destinationFs, destination, false, true, 
hiveConf);
 }
   }
 
+  private void deleteSubDirs(FileSystem destinationFs, Path destination) 
throws IOException {
+for (FileStatus status : destinationFs.listStatus(destination)) {

Review comment:
   We need the destination folder to be present. Thats an extra FS call if 
we delete first and recreate again.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 425588)
Time Spent: 0.5h  (was: 20m)

> Checkpointing in repl dump failing for orc format
> -
>
> Key: HIVE-23235
> URL: https://issues.apache.org/jira/browse/HIVE-23235
> Project: Hive
>  Issue Type: Bug
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23235.01.patch, HIVE-23235.02.patch, 
> HIVE-23235.03.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23235) Checkpointing in repl dump failing for orc format

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23235?focusedWorklogId=425558&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-425558
 ]

ASF GitHub Bot logged work on HIVE-23235:
-

Author: ASF GitHub Bot
Created on: 21/Apr/20 05:28
Start Date: 21/Apr/20 05:28
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #987:
URL: https://github.com/apache/hive/pull/987#discussion_r411869774



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -383,10 +385,17 @@ private void doRegularCopyOnce(FileSystem sourceFs, 
List srcList,
 throw new IOException(e);
   }
 } else {
+  deleteSubDirs(destinationFs, destination);
   FileUtil.copy(sourceFs, paths, destinationFs, destination, false, true, 
hiveConf);
 }
   }
 
+  private void deleteSubDirs(FileSystem destinationFs, Path destination) 
throws IOException {
+for (FileStatus status : destinationFs.listStatus(destination)) {

Review comment:
   Why can't we delete the destination and recreate the dest folder if 
required? 

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -761,6 +761,104 @@ public void testCheckPointingDataDumpFailureRegularCopy() 
throws Throwable {
 .verifyResults(new String[]{"11", "21"});
   }
 
+  @Test
+  public void testCheckPointingDataDumpFailureORCTableRegularCopy() throws 
Throwable {
+WarehouseInstance.Tuple bootstrapDump = primary.run("use " + primaryDbName)
+.run("CREATE TABLE t1(a int) clustered by (a) into 2 buckets" +
+" stored as orc TBLPROPERTIES ('transactional'='true')")
+.run("CREATE TABLE t2(a string) STORED AS TEXTFILE")

Review comment:
   Add one empty table too.

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -761,6 +761,104 @@ public void testCheckPointingDataDumpFailureRegularCopy() 
throws Throwable {
 .verifyResults(new String[]{"11", "21"});
   }
 
+  @Test
+  public void testCheckPointingDataDumpFailureORCTableRegularCopy() throws 
Throwable {
+WarehouseInstance.Tuple bootstrapDump = primary.run("use " + primaryDbName)
+.run("CREATE TABLE t1(a int) clustered by (a) into 2 buckets" +
+" stored as orc TBLPROPERTIES ('transactional'='true')")
+.run("CREATE TABLE t2(a string) STORED AS TEXTFILE")
+.run("insert into t1 values (1)")
+.run("insert into t1 values (2)")
+.run("insert into t1 values (3)")
+.run("insert into t2 values (11)")
+.run("insert into t2 values (21)")
+.dump(primaryDbName);
+FileSystem fs = new Path(bootstrapDump.dumpLocation).getFileSystem(conf);
+Path dumpPath = new Path(bootstrapDump.dumpLocation, 
ReplUtils.REPL_HIVE_BASE_DIR);
+assertTrue(fs.exists(new Path(dumpPath, DUMP_ACKNOWLEDGEMENT.toString(;
+Path metadataPath = new Path(dumpPath, EximUtil.METADATA_PATH_NAME);
+long modifiedTimeMetadata = 
fs.getFileStatus(metadataPath).getModificationTime();
+Path dataPath = new Path(dumpPath, EximUtil.DATA_PATH_NAME);
+Path dbPath = new Path(dataPath, primaryDbName.toLowerCase());
+Path tablet1Path = new Path(dbPath, "t1");
+Path tablet2Path = new Path(dbPath, "t2");
+//Delete dump ack and t2 data, metadata should be rewritten, data should 
be same for t1 but rewritten for t2
+fs.delete(new Path(dumpPath, DUMP_ACKNOWLEDGEMENT.toString()), true);
+assertFalse(fs.exists(new Path(dumpPath, 
DUMP_ACKNOWLEDGEMENT.toString(;
+FileStatus[] statuses = fs.listStatus(tablet2Path);
+//Delete t2 data
+fs.delete(statuses[0].getPath(), true);
+long modifiedTimeTable1 = 
fs.getFileStatus(tablet1Path).getModificationTime();
+long modifiedTimeTable1CopyFile = 
fs.listStatus(tablet1Path)[0].getModificationTime();
+long modifiedTimeTable2 = 
fs.getFileStatus(tablet2Path).getModificationTime();
+//Do another dump. It should only dump table t2. Modification time of 
table t1 should be same while t2 is greater
+WarehouseInstance.Tuple nextDump = primary.dump(primaryDbName);
+assertEquals(nextDump.dumpLocation, bootstrapDump.dumpLocation);
+assertTrue(fs.exists(new Path(dumpPath, DUMP_ACKNOWLEDGEMENT.toString(;
+//File is copied again as we are using regular copy
+assertTrue(modifiedTimeTable1 < 
fs.getFileStatus(tablet1Path).getModificationTime());
+assertTrue(modifiedTimeTable1CopyFile < 
fs.listStatus(tablet1Path)[0].getModificationTime());
+assertTrue(modifiedTimeTable2 < 
fs.getFileStatus(tablet2Path).getModificationTime());
+assertTrue(modifiedTimeMetadata < 
fs.getFileStatus(metadataPath)

[jira] [Work logged] (HIVE-23235) Checkpointing in repl dump failing for orc format

2020-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23235?focusedWorklogId=425314&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-425314
 ]

ASF GitHub Bot logged work on HIVE-23235:
-

Author: ASF GitHub Bot
Created on: 20/Apr/20 13:11
Start Date: 20/Apr/20 13:11
Worklog Time Spent: 10m 
  Work Description: aasha opened a new pull request #987:
URL: https://github.com/apache/hive/pull/987


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 425314)
Remaining Estimate: 0h
Time Spent: 10m

> Checkpointing in repl dump failing for orc format
> -
>
> Key: HIVE-23235
> URL: https://issues.apache.org/jira/browse/HIVE-23235
> Project: Hive
>  Issue Type: Bug
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23235.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)