Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-11 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/
---

(Updated Feb. 11, 2016, 9:54 p.m.)


Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.


Changes
---

Fixed the test failures


Bugs: HIVE-12965
https://issues.apache.org/jira/browse/HIVE-12965


Repository: hive-git


Description
---

In Hive, "insert overwrite local directory" first deletes the overwritten 
directory if exists, recreate a new one, then copy the files from src directory 
to the new local directory. This process sometimes changes the permissions of 
the to-be-overwritten local directory, therefore causing some applications no 
more to be able to access its content.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 

Diff: https://reviews.apache.org/r/43176/diff/


Testing
---

Manual tests
Precommit tests


Thanks,

Chaoyu Tang



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-11 Thread Chaoyu Tang


> On Feb. 11, 2016, 10:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 110
> > 
> >
> > qq: So closing the file systems gave the issues seen in the test 
> > failures?

Yeah. Unexpected but it might be reasonable.


- Chaoyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118939
---


On Feb. 11, 2016, 9:54 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 11, 2016, 9:54 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-11 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118939
---




ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 


qq: So closing the file systems gave the issues seen in the test failures?


- Xuefu Zhang


On Feb. 11, 2016, 9:54 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 11, 2016, 9:54 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-11 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118941
---


Ship it!




Ship It!

- Xuefu Zhang


On Feb. 11, 2016, 9:54 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 11, 2016, 9:54 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-09 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/
---

(Updated Feb. 9, 2016, 6:54 p.m.)


Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.


Changes
---

Update the patch based on the review.


Bugs: HIVE-12965
https://issues.apache.org/jira/browse/HIVE-12965


Repository: hive-git


Description
---

In Hive, "insert overwrite local directory" first deletes the overwritten 
directory if exists, recreate a new one, then copy the files from src directory 
to the new local directory. This process sometimes changes the permissions of 
the to-be-overwritten local directory, therefore causing some applications no 
more to be able to access its content.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 

Diff: https://reviews.apache.org/r/43176/diff/


Testing
---

Manual tests
Precommit tests


Thanks,

Chaoyu Tang



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-09 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118398
---




ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 135)


I think the existance of a file/dir can be checked by calling FS.exists(). 
I usually avoid depending on exception to do something useful other than 
handling exception itself. However, it's up to you.



ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 160)


it seems fs didn't get closed in all paths. However, I understand that the 
problem was there before.


- Xuefu Zhang


On Feb. 4, 2016, 4:07 a.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 4, 2016, 4:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-09 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118462
---


Ship it!




Ship It!

- Xuefu Zhang


On Feb. 9, 2016, 8:55 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 9, 2016, 8:55 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-09 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/
---

(Updated Feb. 9, 2016, 8:55 p.m.)


Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.


Changes
---

Revised patch based on Xuefu's comments.


Bugs: HIVE-12965
https://issues.apache.org/jira/browse/HIVE-12965


Repository: hive-git


Description
---

In Hive, "insert overwrite local directory" first deletes the overwritten 
directory if exists, recreate a new one, then copy the files from src directory 
to the new local directory. This process sometimes changes the permissions of 
the to-be-overwritten local directory, therefore causing some applications no 
more to be able to access its content.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 

Diff: https://reviews.apache.org/r/43176/diff/


Testing
---

Manual tests
Precommit tests


Thanks,

Chaoyu Tang



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-09 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review118442
---




ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 120)


Do we need returned boolean? I don't see it's used.



ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 171)


could we use FS.exists() to detect this instead of relying on exception 
handling? See below exmaple when checking the sourcePath.


- Xuefu Zhang


On Feb. 9, 2016, 6:54 p.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 9, 2016, 6:54 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-04 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review117816
---




ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 129)


two nested try-catch block seem making the code hard to read.



ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 135)


Should we check if it's a directory before calling listStatus() on it?



ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java (line 144)


Instead of doing this, should we explicitly check the existance of the 
destination?


On a high level, deleting file one by one is slower. Could we instead remember 
the original permission and set it to the new directory that we are going to 
replace?

- Xuefu Zhang


On Feb. 4, 2016, 4:07 a.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 4, 2016, 4:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>



Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission

2016-02-04 Thread Chaoyu Tang


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> >

Thanks Xuefu for review. I answered all the questions and please let me know if 
they make sense.


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 152
> > 
> >
> > Instead of doing this, should we explicitly check the existance of the 
> > destination?

Yeah, if the code is like following and first explicitly checks the existence 
of target, it might be more readable. Instead I combined the exists and 
listStatus into one listStatus call which can save one trip to HDFS in case 
that targetPath exists.

if (dstFs.exists(targetPath)) {
   FileStatus[] destFiles = dstFs.listStatus(targetPath);
   if (dstFs.isDirectory(targetPath)) {
 for (FileStatus destFile : destFiles) {
   if (!dstFs.delete(destFile.getPath(), true)) {
 throw new IOException("Unable to clean the destination directory: " + 
targetPath);
   }
 }
   }
} else {
   if (!FileUtils.mkdir(dstFs, targetPath, false, conf)) {
 throw new HiveException("Failed to create local target directory " + 
targetPath);
}


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 143
> > 
> >
> > Should we check if it's a directory before calling listStatus() on it?

By first calling FileStatus[] destFiles = dstFs.listStatus(targetPath), we can 
tell if the file with path targetPath exists or not. If yes, we then check if 
it is directory, otherwise it throws FileNotFoundException and in its catch 
block we create a directory with targetPath. (Please see explanation in next 
comment for the reason why it is implemented in this way).
But if we do dstFs.isDirectory(targetPath) first, for a non-existing directory, 
it returns false instead of throwing FileNotFoundException and going to 
FileNotFoundException block to create the target directory.
Yeah, I initally did in that way and found out that trick in the testing.


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 137
> > 
> >
> > two nested try-catch block seem making the code hard to read.

Yeah, the outer try-catch
  } catch (IOException ioe) {
throw new HiveException("Unable to move source " + sourcePath + " to 
destination "
+ targetPath, ioe);
  }
Is mainly for logging purpose. Should we remove?


On Feb. 4, 2016, 2:51 p.m., Chaoyu Tang wrote:
> > On a high level, deleting file one by one is slower. Could we instead 
> > remember the original permission and set it to the new directory that we 
> > are going to replace?

Yeah, I initially implemented it in the way you suggested which is more 
performant. But it could not get the right permission to my local directory 
probably due to the defect in HDFS RawLocalFileSystem. It always return the 
HDFS default one (0666)to all local files/directory. I have put comment in code 
to explain why I delete the file one by one.
  // RawLocalFileSystem seems not able to get the right permissions for 
a local file, it
  // always returns hdfs default permission (00666). So we can not 
overwrite a directory
  // by deleting and recreating the directory and restoring its 
permissions. We should
  // delete all its files and subdirectories instead.


- Chaoyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review117816
---


On Feb. 4, 2016, 4:07 a.m., Chaoyu Tang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> ---
> 
> (Updated Feb. 4, 2016, 4:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
> https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> ---
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu