[jira] [Updated] (HDFS-15638) Make Hive tables directory permission check flat

2020-10-16 Thread Xinli Shang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinli Shang updated HDFS-15638:
---
Description: 
Problem: Currently, when a user tries to accesses a file he/she needs the 
permissions of it's parent and ancestors and the permission of that file. This 
is correct generally, but for Hive tables directories/files, all the files 
under a partition or even a table usually have the same permissions for the 
same set of ACL groups. Although the permissions and ACL groups are the same, 
the writer still need to call setfacl() for every file. This results in a huge 
amount of RPC calls to NN. HDFS has default ACL to solve that but that only 
applies to create and copy, but not apply for rename. However, in Hive ETL, 
rename is very common. 

Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it is 
a Hive table directory. If that flag is set, then all the sub-directory and 
files under it will just use it's permission and ACL groups settings. By doing 
this way, Hive ETL doesn't need to set permissions at the file level. If that 
flag is not set(by default), work as before. To set/unset that flag, it would 
require admin privilege. 

 

 

 

 

 

 

  was:
Problem: Currently, when a user tries to accesses a file he/she needs not only 
the permission of that file but also the permissions of it's parent and 
ancestors. This is correct, but for Hive tables directory/files, all the files 
under a partition or even a table usually have the same permissions for the 
same set of ACL groups. Although the permissions and ACL groups are the same, 
the writer sometimes still need to call setfacl() for every file. This results 
in a huge amount of RPC calls to NN. HDFS has default ACL to solve that but 
that only applies to create and copy, but not apply for rename. However, in 
Hive ETL, rename is very common. 

Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it is 
a Hive table directory. If that flag is set, then all the sub-directory and 
files under it will just use it's permission and ACL groups settings. By doing 
this way, Hive ETL doesn't need to set permissions at the file level. If that 
flag is not set(by default), work as before. To set/unset that flag, it would 
require admin privilege. 

 

 

 

 

 

 


> Make Hive tables directory permission check flat 
> -
>
> Key: HDFS-15638
> URL: https://issues.apache.org/jira/browse/HDFS-15638
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Xinli Shang
>Priority: Major
>
> Problem: Currently, when a user tries to accesses a file he/she needs the 
> permissions of it's parent and ancestors and the permission of that file. 
> This is correct generally, but for Hive tables directories/files, all the 
> files under a partition or even a table usually have the same permissions for 
> the same set of ACL groups. Although the permissions and ACL groups are the 
> same, the writer still need to call setfacl() for every file. This results in 
> a huge amount of RPC calls to NN. HDFS has default ACL to solve that but that 
> only applies to create and copy, but not apply for rename. However, in Hive 
> ETL, rename is very common. 
> Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it 
> is a Hive table directory. If that flag is set, then all the sub-directory 
> and files under it will just use it's permission and ACL groups settings. By 
> doing this way, Hive ETL doesn't need to set permissions at the file level. 
> If that flag is not set(by default), work as before. To set/unset that flag, 
> it would require admin privilege. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15638) Make Hive tables directory permission check flat

2020-10-16 Thread Xinli Shang (Jira)
Xinli Shang created HDFS-15638:
--

 Summary: Make Hive tables directory permission check flat 
 Key: HDFS-15638
 URL: https://issues.apache.org/jira/browse/HDFS-15638
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Xinli Shang


Problem: Currently, when a user tries to accesses a file he/she needs not only 
the permission of that file but also the permissions of it's parent and 
ancestors. This is correct, but for Hive tables directory/files, all the files 
under a partition or even a table usually have the same permissions for the 
same set of ACL groups. Although the permissions and ACL groups are the same, 
the writer sometimes still need to call setfacl() for every file. This results 
in a huge amount of RPC calls to NN. HDFS has default ACL to solve that but 
that only applies to create and copy, but not apply for rename. However, in 
Hive ETL, rename is very common. 

Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it is 
a Hive table directory. If that flag is set, then all the sub-directory and 
files under it will just use it's permission and ACL groups settings. By doing 
this way, Hive ETL doesn't need to set permissions at the file level. If that 
flag is not set(by default), work as before. To set/unset that flag, it would 
require admin privilege. 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2542) Transparent compression storage in HDFS

2019-11-19 Thread Xinli Shang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978088#comment-16978088
 ] 

Xinli Shang commented on HDFS-2542:
---

Any update on this? 

> Transparent compression storage in HDFS
> ---
>
> Key: HDFS-2542
> URL: https://issues.apache.org/jira/browse/HDFS-2542
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: jinglong.liujl
>Priority: Major
> Attachments: tranparent compress storage.docx
>
>
> As HDFS-2115, we want to provide a mechanism to improve storage usage in hdfs 
> by compression. Different from HDFS-2115, this issue focus on compress 
> storage. Some idea like below:
> To do:
> 1. compress cold data.
>Cold data: After writing (or last read), data has not touched by anyone 
> for a long time.
>Hot data: After writing, many client will read it , maybe it'll delele 
> soon.
>
>Because hot data compression is not cost-effective,  we only compress cold 
> data. 
>In some cases, some data in file can be access in high frequency,  but in 
> the same file, some data may be cold data. 
> To distinguish them, we compress in block level.
> 2. compress data which has high compress ratio.
>To specify high/low compress ratio, we should try to compress data, if 
> compress ratio is too low, we'll never compress them.
> 2. forward compatibility.
> After compression, data format in datanode has changed. Old client will 
> not access them. To solve this issue, we provide a mechanism which decompress 
> on datanode.
> 3. support random access and append.
>As HDFS-2115, random access can be support by index. We separate data 
> before compress by fixed-length (we call these fixed-length data as "chunk"), 
> every chunk has its index.
> When random access, we can seek to the nearest index, and read this chunk for 
> precise position.   
> 4. async compress to avoid compression slow down running job.
>In practice, we found the cluster CPU usage is not uniform. Some clusters 
> are idle at night, and others are idle at afternoon. We should make compress 
> task running in full speed when cluster idle, and in low speed when cluster 
> busy.
> Will do:
> 1. client specific codec and support  compress transmission.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org