[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572487#comment-16572487
 ] 

genericqa commented on HDFS-13088:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 16s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13088 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934718/HDFS-13088.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fbd7c5b5d462 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 861095f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24722/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24722/testReport/ |
| Max. process+thread count | 3445 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-08-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572361#comment-16572361
 ] 

Íñigo Goiri commented on HDFS-13088:


I think this is the least intrusive way to provide over-replication for 
provided blocks.
Ideally, this should be done at a file level but as you mentioned this would 
require many more changes including interface ones.
I would vote to start with this; moving forward we should have a story for what 
happens when this setting and the per-file configuration are available.

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch, HDFS-13088.002.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-08-07 Thread Virajith Jalaparti (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572328#comment-16572328
 ] 

Virajith Jalaparti commented on HDFS-13088:
---

Thanks for the feedback [~elgoiri] and [~ehiggs].

 [^HDFS-13088.002.patch]  is an alternate approach to implement this -- It adds 
a new parameter {{dfs.provided.overreplication.factor}} which allows specifying 
how many extra replicas can be allowed for blocks that are PROVIDED. This is a 
single value for all blocks/files in the system and ephemeral (not necessarily 
retained across Namenode restarts unless the config value remains the same). 
However, there are no changes to {{FileSystem}} or {{INodeFile}} and much less 
intrusive.

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch, HDFS-13088.002.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-08-03 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568438#comment-16568438
 ] 

Íñigo Goiri commented on HDFS-13088:


I think that adding overreplication as a default of 0 to setReplication makes 
sense.
However, I'm in between changing the method or adding an extra one with the new 
parameter.
The current approach in [^HDFS-13088.001.patch] adds the parameter and changes 
all the other tests.
Maybe we should add a new method instead of changing the existing one.

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-03-15 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400419#comment-16400419
 ] 

Ewan Higgs commented on HDFS-13088:
---

Do we want {{setReplication}} in FileSystem. If this is a HDFS feature, 
shouldn't it be in {{DistributedFileSystem}}. And/Or this would need a 
capability as introduced in HADOOP-14707 (discussed in HADOOP-11644).

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-03-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399788#comment-16399788
 ] 

Íñigo Goiri commented on HDFS-13088:


I guess there is no way to add new fields to the header.
I think keeping backwards compatibility is pretty important.
In addition, we should support over replications higher than 8.

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-03-14 Thread Virajith Jalaparti (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399734#comment-16399734
 ] 

Virajith Jalaparti commented on HDFS-13088:
---

[~elgoiri] - An alternative would be to create a new field in {{InodeFile}} for 
the over-replication factor instead of changing the {{HeaderFormat}}.

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-03-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399725#comment-16399725
 ] 

Íñigo Goiri commented on HDFS-13088:


Thanks [~virajith] for working on this.
We definitely see value for this feature once it's combined with tiered storage 
and caching.

The most risky change here is the {{INodeFile}} change.
If there are files with replication >=256, the NameNode will fail to start.
This was also tweaked in HDFS-7866 for EC.
Is there a way we can add the over replication without restricting the number 
of replicas?
Note that we would only allow 8 replicas for over replication.

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.

2018-03-14 Thread Virajith Jalaparti (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399706#comment-16399706
 ] 

Virajith Jalaparti commented on HDFS-13088:
---

Posting an initial patch to get feedback on the approach. The key changes are:
1) Add a new {{setReplication(string, short, short)}} and 
{{getOverReplication}} calls to {{FileSystem}}.
2) Change the {{InodeFile#HeaderFormat}} so that of the 11bits that were 
reserved for the replication factor, 3 bits are used for over-replication
3) Change the {{setrep}} command (and ClientNamenodeProtocol) to allow setting 
the over-replication factor on a file.

The idea behind the changes was that over-replication is a "new kind" of 
replication factor, and thus, I modified existing ways to set the replication 
factor on a file to include over-replication

> Allow HDFS files/blocks to be over-replicated.
> --
>
> Key: HDFS-13088
> URL: https://issues.apache.org/jira/browse/HDFS-13088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
> Attachments: HDFS-13088.001.patch
>
>
> This JIRA is to add a per-file "over-replication" factor to HDFS. As 
> mentioned in HDFS-13069, the over-replication factor will be the excess 
> replicas that will be allowed to exist for a file or block. This is 
> beneficial if the application deems additional replicas for a file are 
> needed. In the case of  HDFS-13069, it would allow copies of data in PROVIDED 
> storage to be cached locally in HDFS in a read-through manner.
> The Namenode will not proactively meet the over-replication i.e., it does not 
> schedule replications if the number of replicas for a block is less than 
> (replication factor + over-replication factor) as long as they are more than 
> the replication factor of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org