[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572487#comment-16572487 ] genericqa commented on HDFS-13088: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.TestHdfsConfigFields | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HDFS-13088 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934718/HDFS-13088.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fbd7c5b5d462 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 861095f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/24722/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/24722/testReport/ | | Max. process+thread count | 3445 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572361#comment-16572361 ] Íñigo Goiri commented on HDFS-13088: I think this is the least intrusive way to provide over-replication for provided blocks. Ideally, this should be done at a file level but as you mentioned this would require many more changes including interface ones. I would vote to start with this; moving forward we should have a story for what happens when this setting and the per-file configuration are available. > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch, HDFS-13088.002.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572328#comment-16572328 ] Virajith Jalaparti commented on HDFS-13088: --- Thanks for the feedback [~elgoiri] and [~ehiggs]. [^HDFS-13088.002.patch] is an alternate approach to implement this -- It adds a new parameter {{dfs.provided.overreplication.factor}} which allows specifying how many extra replicas can be allowed for blocks that are PROVIDED. This is a single value for all blocks/files in the system and ephemeral (not necessarily retained across Namenode restarts unless the config value remains the same). However, there are no changes to {{FileSystem}} or {{INodeFile}} and much less intrusive. > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch, HDFS-13088.002.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568438#comment-16568438 ] Íñigo Goiri commented on HDFS-13088: I think that adding overreplication as a default of 0 to setReplication makes sense. However, I'm in between changing the method or adding an extra one with the new parameter. The current approach in [^HDFS-13088.001.patch] adds the parameter and changes all the other tests. Maybe we should add a new method instead of changing the existing one. > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400419#comment-16400419 ] Ewan Higgs commented on HDFS-13088: --- Do we want {{setReplication}} in FileSystem. If this is a HDFS feature, shouldn't it be in {{DistributedFileSystem}}. And/Or this would need a capability as introduced in HADOOP-14707 (discussed in HADOOP-11644). > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399788#comment-16399788 ] Íñigo Goiri commented on HDFS-13088: I guess there is no way to add new fields to the header. I think keeping backwards compatibility is pretty important. In addition, we should support over replications higher than 8. > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399734#comment-16399734 ] Virajith Jalaparti commented on HDFS-13088: --- [~elgoiri] - An alternative would be to create a new field in {{InodeFile}} for the over-replication factor instead of changing the {{HeaderFormat}}. > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399725#comment-16399725 ] Íñigo Goiri commented on HDFS-13088: Thanks [~virajith] for working on this. We definitely see value for this feature once it's combined with tiered storage and caching. The most risky change here is the {{INodeFile}} change. If there are files with replication >=256, the NameNode will fail to start. This was also tweaked in HDFS-7866 for EC. Is there a way we can add the over replication without restricting the number of replicas? Note that we would only allow 8 replicas for over replication. > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13088) Allow HDFS files/blocks to be over-replicated.
[ https://issues.apache.org/jira/browse/HDFS-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399706#comment-16399706 ] Virajith Jalaparti commented on HDFS-13088: --- Posting an initial patch to get feedback on the approach. The key changes are: 1) Add a new {{setReplication(string, short, short)}} and {{getOverReplication}} calls to {{FileSystem}}. 2) Change the {{InodeFile#HeaderFormat}} so that of the 11bits that were reserved for the replication factor, 3 bits are used for over-replication 3) Change the {{setrep}} command (and ClientNamenodeProtocol) to allow setting the over-replication factor on a file. The idea behind the changes was that over-replication is a "new kind" of replication factor, and thus, I modified existing ways to set the replication factor on a file to include over-replication > Allow HDFS files/blocks to be over-replicated. > -- > > Key: HDFS-13088 > URL: https://issues.apache.org/jira/browse/HDFS-13088 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > Attachments: HDFS-13088.001.patch > > > This JIRA is to add a per-file "over-replication" factor to HDFS. As > mentioned in HDFS-13069, the over-replication factor will be the excess > replicas that will be allowed to exist for a file or block. This is > beneficial if the application deems additional replicas for a file are > needed. In the case of HDFS-13069, it would allow copies of data in PROVIDED > storage to be cached locally in HDFS in a read-through manner. > The Namenode will not proactively meet the over-replication i.e., it does not > schedule replications if the number of replicas for a block is less than > (replication factor + over-replication factor) as long as they are more than > the replication factor of the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org