[jira] [Comment Edited] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory
[ https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338473#comment-16338473 ] Manoj Govindassamy edited comment on HDFS-12051 at 1/25/18 12:23 AM: - Thanks for working on this [~mi...@cloudera.com]. Few comments on HDFS-12051.07.patch {{NameCache.java}} * line 97: {{cache = new byte[cacheSize][];}} Since this will take up a contiguous space, we need to restrict the cache size to much lesser size than your current MAX size of 1 << 30. Your thoughts? * {{#cache}} is now following the {{open addressing}} model. Any reasons why you moved to this model compared to your initial design? * {{#put()}} ** line 118: the first time cache fill .. shouldn't it be a new byte array name constructed from the passed in name? Why use the same caller passed in name? ** With the {{open addressing}} model, when you overwrite the cache slot with new names, there could be INodes which are already referring to this name and are cut from the cache. Though their references are still valid, want to understand why the preference given to new names compared to the old one. * I don't see any cache invalidation even when the INodes are removed. This takes up memory. Though not huge, design wise its not clean to leave the cache with stale values and incur cache lookup penalty in the future put() * {{#getSize()}} since there is no cache invalidation, and since this open addressing model, the size returned is not right. * line 149: {{cacheSizeFor}} is this roundUp or roundDown to the nearest 2 power. Please add the link to {{HashMap#tableSizeFor()}} in the comment to show where the code is inspired from. was (Author: manojg): Thanks for working on this [~mi...@cloudera.com]. Few comments on HDFS-12051.07.patch {{NameCache.java}} * line 97: {{cache = new byte[cacheSize][];}} Since this will take up a contiguous space, we need to restrict the cache size to much lesser size than your current MAX size of 1 << 30. Your thoughts? * {{#cache}} is now following the {{open addressing}} model. Any reasons why you moved to this model compared to your initial design? * {{#put()}} ** line 118: the first time cache fill .. shouldn't it be a new byte array name constructed from the passed in name? Why use the same caller passed in name? ** With the {{open addressing}} model, when you overwrite the cache slot with new names, there could be INodes which are already referring to this name and are cut from the cache. * I don't see any cache invalidation even when the INodes are removed. This takes up memory. Though not huge, design wise its not clean to leave the cache with stale values and incur cache lookup penalty in the future put() * {{#getSize()}} since there is no cache invalidation, and since this open addressing model, the size returned is not right. * line 149: {{cacheSizeFor}} is this roundUp or roundDown to the nearest 2 power. Please add the link to {{HashMap#tableSizeFor()}} in the comment to show where the code is inspired from. > Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly > those denoting file/directory names) to save memory > - > > Key: HDFS-12051 > URL: https://issues.apache.org/jira/browse/HDFS-12051 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: HDFS-12051.01.patch, HDFS-12051.02.patch, > HDFS-12051.03.patch, HDFS-12051.04.patch, HDFS-12051.05.patch, > HDFS-12051.06.patch, HDFS-12051.07.patch > > > When snapshot diff operation is performed in a NameNode that manages several > million HDFS files/directories, NN needs a lot of memory. Analyzing one heap > dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays > result in 6.5% memory overhead, and most of these arrays are referenced by > {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}} > and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}: > {code:java} > 19. DUPLICATE PRIMITIVE ARRAYS > Types of duplicate objects: > Ovhd Num objs Num unique objs Class name > 3,220,272K (6.5%) 104749528 25760871 byte[] > > 1,841,485K (3.7%), 53194037 dup arrays (13158094 unique) > 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 > of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, > 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), > 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...),
[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory
[ https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338473#comment-16338473 ] Manoj Govindassamy commented on HDFS-12051: --- Thanks for working on this [~mi...@cloudera.com]. Few comments on HDFS-12051.07.patch {{NameCache.java}} * line 97: {{cache = new byte[cacheSize][];}} Since this will take up a contiguous space, we need to restrict the cache size to much lesser size than your current MAX size of 1 << 30. Your thoughts? * {{#cache}} is now following the {{open addressing}} model. Any reasons why you moved to this model compared to your initial design? * {{#put()}} ** line 118: the first time cache fill .. shouldn't it be a new byte array name constructed from the passed in name? Why use the same caller passed in name? ** With the {{open addressing}} model, when you overwrite the cache slot with new names, there could be INodes which are already referring to this name and are cut from the cache. * I don't see any cache invalidation even when the INodes are removed. This takes up memory. Though not huge, design wise its not clean to leave the cache with stale values and incur cache lookup penalty in the future put() * {{#getSize()}} since there is no cache invalidation, and since this open addressing model, the size returned is not right. * line 149: {{cacheSizeFor}} is this roundUp or roundDown to the nearest 2 power. Please add the link to {{HashMap#tableSizeFor()}} in the comment to show where the code is inspired from. > Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly > those denoting file/directory names) to save memory > - > > Key: HDFS-12051 > URL: https://issues.apache.org/jira/browse/HDFS-12051 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: HDFS-12051.01.patch, HDFS-12051.02.patch, > HDFS-12051.03.patch, HDFS-12051.04.patch, HDFS-12051.05.patch, > HDFS-12051.06.patch, HDFS-12051.07.patch > > > When snapshot diff operation is performed in a NameNode that manages several > million HDFS files/directories, NN needs a lot of memory. Analyzing one heap > dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays > result in 6.5% memory overhead, and most of these arrays are referenced by > {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}} > and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}: > {code:java} > 19. DUPLICATE PRIMITIVE ARRAYS > Types of duplicate objects: > Ovhd Num objs Num unique objs Class name > 3,220,272K (6.5%) 104749528 25760871 byte[] > > 1,841,485K (3.7%), 53194037 dup arrays (13158094 unique) > 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 > of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, > 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), > 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 > of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, > 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...) > ... and 45902395 more arrays, of which 13158084 are unique > <-- > org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name > <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode > <-- {j.u.ArrayList} <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs > <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- > org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 > elements) ... <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0 > <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java > Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER > 409,830K (0.8%), 13482787 dup arrays (13260241 unique) > 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of
[jira] [Commented] (HDFS-11225) NameNode crashed because deleteSnapshot held FSNamesystem lock too long
[ https://issues.apache.org/jira/browse/HDFS-11225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335277#comment-16335277 ] Manoj Govindassamy commented on HDFS-11225: --- [~shashikant], please feel free to own this bug and followup with your proposal. I am on to other things and not able to spend time on this. My apologies. > NameNode crashed because deleteSnapshot held FSNamesystem lock too long > --- > > Key: HDFS-11225 > URL: https://issues.apache.org/jira/browse/HDFS-11225 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 > Environment: CDH5.8.2, HA >Reporter: Wei-Chiu Chuang >Assignee: Manoj Govindassamy >Priority: Critical > Labels: high-availability > Attachments: Snaphot_Deletion_Design_Proposal.pdf > > > The deleteSnapshot operation is synchronous. In certain situations this > operation may hold FSNamesystem lock for too long, bringing almost every > NameNode operation to a halt. > We have observed one incidence where it took so long that ZKFC believes the > NameNode is down. All other IPC threads were waiting to acquire FSNamesystem > lock. This specific deleteSnapshot took ~70 seconds. ZKFC has connection > timeout of 45 seconds by default, and if all IPC threads wait for > FSNamesystem lock and can't accept new incoming connection, ZKFC times out, > advances epoch and NameNode will therefore lose its active NN role and then > fail. > Relevant log: > {noformat} > Thread 154 (IPC Server handler 86 on 8020): > State: RUNNABLE > Blocked count: 2753455 > Waited count: 89201773 > Stack: > > org.apache.hadoop.hdfs.server.namenode.INode$BlocksMapUpdateInfo.addDeleteBlock(INode.java:879) > > org.apache.hadoop.hdfs.server.namenode.INodeFile.destroyAndCollectBlocks(INodeFile.java:508) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763) > > org.apache.hadoop.hdfs.server.namenode.INodeReference.destroyAndCollectBlocks(INodeReference.java:339) > > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.destroyAndCollectBlocks(INodeReference.java:606) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$ChildrenDiff.destroyDeletedList(DirectoryWithSnapshotFeature.java:119) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$ChildrenDiff.access$400(DirectoryWithSnapshotFeature.java:61) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.destroyDiffAndCollectBlocks(DirectoryWithSnapshotFeature.java:319) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.destroyDiffAndCollectBlocks(DirectoryWithSnapshotFeature.java:167) > > org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:83) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:745) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:776) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:747) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:747) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:776) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:747) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:789) > {noformat} > After the ZKFC determined NameNode was down and advanced epoch, the NN > finished deleting snapshot, and sent the edit to journal nodes, but it was > rejected because epoch was updated. See the following stacktrace: > {noformat} > 10.0.16.21:8485: IPC's epoch 17 is less than the last promised epoch 18 > at > org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:429) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:457) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:352) > at >
[jira] [Commented] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329062#comment-16329062 ] Manoj Govindassamy commented on HDFS-11847: --- [~jlowe], My bad. My intention was only to use branch-3.0 and not create a new branch. Will check my scripts. Thanks for spotting this. Please let me know the corrective actions and I will follow them. > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Major > Fix For: 3.1.0, 3.0.1 > > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, > HDFS-11847.03.patch, HDFS-11847.04.patch, HDFS-11847.05.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324606#comment-16324606 ] Manoj Govindassamy commented on HDFS-11847: --- Given that HDFS-10480 is available in 3.0, back ported both HDFS-11847 and HDFS-11848 to branch-3. > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.1.0, 3.0.1 > > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, > HDFS-11847.03.patch, HDFS-11847.04.patch, HDFS-11847.05.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11848) Enhance dfsadmin listOpenFiles command to list files under a given path
[ https://issues.apache.org/jira/browse/HDFS-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324604#comment-16324604 ] Manoj Govindassamy commented on HDFS-11848: --- [~linyiqun], Given that HDFS-10480 is available in 3.0, back ported both HDFS-11847 and HDFS-11848 to branch-3. Hope this is ok with you. Please let me know if otherwise. > Enhance dfsadmin listOpenFiles command to list files under a given path > --- > > Key: HDFS-11848 > URL: https://issues.apache.org/jira/browse/HDFS-11848 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Yiqun Lin > Fix For: 3.1.0, 3.0.1 > > Attachments: HDFS-11848.001.patch, HDFS-11848.002.patch, > HDFS-11848.003.patch, HDFS-11848.004.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > One more thing that would be nice here is to filter the output on a passed > path or DataNode. Usecases: An admin might already know a stale file by path > (perhaps from fsck's -openforwrite), and wants to figure out who the lease > holder is. Proposal here is add suboptions to {{listOpenFiles}} to list files > filtered by path. > {{LeaseManager#getINodeWithLeases(INodeDirectory)}} can be used to get the > open file list for any given ancestor directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11848) Enhance dfsadmin listOpenFiles command to list files under a given path
[ https://issues.apache.org/jira/browse/HDFS-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11848: -- Fix Version/s: 3.0.1 > Enhance dfsadmin listOpenFiles command to list files under a given path > --- > > Key: HDFS-11848 > URL: https://issues.apache.org/jira/browse/HDFS-11848 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Yiqun Lin > Fix For: 3.1.0, 3.0.1 > > Attachments: HDFS-11848.001.patch, HDFS-11848.002.patch, > HDFS-11848.003.patch, HDFS-11848.004.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > One more thing that would be nice here is to filter the output on a passed > path or DataNode. Usecases: An admin might already know a stale file by path > (perhaps from fsck's -openforwrite), and wants to figure out who the lease > holder is. Proposal here is add suboptions to {{listOpenFiles}} to list files > filtered by path. > {{LeaseManager#getINodeWithLeases(INodeDirectory)}} can be used to get the > open file list for any given ancestor directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11847: -- Fix Version/s: 3.0.1 > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.1.0, 3.0.1 > > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, > HDFS-11847.03.patch, HDFS-11847.04.patch, HDFS-11847.05.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12994) TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket timeout
[ https://issues.apache.org/jira/browse/HDFS-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319031#comment-16319031 ] Manoj Govindassamy commented on HDFS-12994: --- Got it. Patch v01 looks good to me. +1, thanks for working on this. > TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket > timeout > > > Key: HDFS-12994 > URL: https://issues.apache.org/jira/browse/HDFS-12994 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-12994.00.patch, HDFS-12994.01.patch > > > Occasionally, {{testNNSendsErasureCodingTasks}} fails due to socket timeout > {code} > 2017-12-26 20:35:19,961 [StripedBlockReconstruction-0] INFO > datanode.DataNode (StripedBlockReader.java:createBlockReader(132)) - > Exception while creating remote block reader, datanode 127.0.0.1:34145 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.newConnectedPeer(StripedBlockReader.java:148) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.createBlockReader(StripedBlockReader.java:123) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.(StripedBlockReader.java:83) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.createReader(StripedReader.java:169) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.initReaders(StripedReader.java:150) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.init(StripedReader.java:133) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:56) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > while the target datanode is removed in the test: > {code} > 2017-12-26 20:35:18,710 [Thread-2393] INFO net.NetworkTopology > (NetworkTopology.java:remove(219)) - Removing a node: > /default-rack/127.0.0.1:34145 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12994) TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket timeout
[ https://issues.apache.org/jira/browse/HDFS-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319031#comment-16319031 ] Manoj Govindassamy edited comment on HDFS-12994 at 1/9/18 7:51 PM: --- Got it. Patch v01 looks good to me. +1, thanks for working on this. was (Author: manojg): Got it. Patch v02 looks good to me. +1, thanks for working on this. > TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket > timeout > > > Key: HDFS-12994 > URL: https://issues.apache.org/jira/browse/HDFS-12994 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-12994.00.patch, HDFS-12994.01.patch > > > Occasionally, {{testNNSendsErasureCodingTasks}} fails due to socket timeout > {code} > 2017-12-26 20:35:19,961 [StripedBlockReconstruction-0] INFO > datanode.DataNode (StripedBlockReader.java:createBlockReader(132)) - > Exception while creating remote block reader, datanode 127.0.0.1:34145 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.newConnectedPeer(StripedBlockReader.java:148) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.createBlockReader(StripedBlockReader.java:123) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.(StripedBlockReader.java:83) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.createReader(StripedReader.java:169) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.initReaders(StripedReader.java:150) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.init(StripedReader.java:133) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:56) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > while the target datanode is removed in the test: > {code} > 2017-12-26 20:35:18,710 [Thread-2393] INFO net.NetworkTopology > (NetworkTopology.java:remove(219)) - Removing a node: > /default-rack/127.0.0.1:34145 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12994) TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket timeout
[ https://issues.apache.org/jira/browse/HDFS-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319031#comment-16319031 ] Manoj Govindassamy edited comment on HDFS-12994 at 1/9/18 7:51 PM: --- Got it. Patch v02 looks good to me. +1, thanks for working on this. was (Author: manojg): Got it. Patch v01 looks good to me. +1, thanks for working on this. > TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket > timeout > > > Key: HDFS-12994 > URL: https://issues.apache.org/jira/browse/HDFS-12994 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-12994.00.patch, HDFS-12994.01.patch > > > Occasionally, {{testNNSendsErasureCodingTasks}} fails due to socket timeout > {code} > 2017-12-26 20:35:19,961 [StripedBlockReconstruction-0] INFO > datanode.DataNode (StripedBlockReader.java:createBlockReader(132)) - > Exception while creating remote block reader, datanode 127.0.0.1:34145 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.newConnectedPeer(StripedBlockReader.java:148) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.createBlockReader(StripedBlockReader.java:123) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.(StripedBlockReader.java:83) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.createReader(StripedReader.java:169) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.initReaders(StripedReader.java:150) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.init(StripedReader.java:133) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:56) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > while the target datanode is removed in the test: > {code} > 2017-12-26 20:35:18,710 [Thread-2393] INFO net.NetworkTopology > (NetworkTopology.java:remove(219)) - Removing a node: > /default-rack/127.0.0.1:34145 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12994) TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket timeout
[ https://issues.apache.org/jira/browse/HDFS-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319023#comment-16319023 ] Manoj Govindassamy commented on HDFS-12994: --- [~eddyxu], The intention is to let client detect DN issues quicker? And, the problem should happen always when the DN is removed right? Just trying to understand the core issue. Thanks. > TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket > timeout > > > Key: HDFS-12994 > URL: https://issues.apache.org/jira/browse/HDFS-12994 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-12994.00.patch, HDFS-12994.01.patch > > > Occasionally, {{testNNSendsErasureCodingTasks}} fails due to socket timeout > {code} > 2017-12-26 20:35:19,961 [StripedBlockReconstruction-0] INFO > datanode.DataNode (StripedBlockReader.java:createBlockReader(132)) - > Exception while creating remote block reader, datanode 127.0.0.1:34145 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.newConnectedPeer(StripedBlockReader.java:148) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.createBlockReader(StripedBlockReader.java:123) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReader.(StripedBlockReader.java:83) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.createReader(StripedReader.java:169) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.initReaders(StripedReader.java:150) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.init(StripedReader.java:133) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:56) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > while the target datanode is removed in the test: > {code} > 2017-12-26 20:35:18,710 [Thread-2393] INFO net.NetworkTopology > (NetworkTopology.java:remove(219)) - Removing a node: > /default-rack/127.0.0.1:34145 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
[ https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317352#comment-16317352 ] Manoj Govindassamy edited comment on HDFS-12985 at 1/9/18 12:31 AM: Thanks for the review [~yzhangal]. Committed it to trunk and branch-2. was (Author: manojg): Thanks for the review [~yzhangal]. Committed it to trunk. > NameNode crashes during restart after an OpenForWrite file present in the > Snapshot got deleted > -- > > Key: HDFS-12985 > URL: https://issues.apache.org/jira/browse/HDFS-12985 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.1.0, 2.10.0 > > Attachments: HDFS-12985.01.patch > > > NameNode crashes repeatedly with NPE at the startup when trying to find the > total number of under construction blocks. This crash happens after an open > file, which was also part of a snapshot gets deleted along with the snapshot. > {noformat} > Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
[ https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12985: -- Resolution: Fixed Fix Version/s: 2.10.0 3.1.0 Target Version/s: 3.1.0, 2.10.0 (was: 3.1.0) Status: Resolved (was: Patch Available) > NameNode crashes during restart after an OpenForWrite file present in the > Snapshot got deleted > -- > > Key: HDFS-12985 > URL: https://issues.apache.org/jira/browse/HDFS-12985 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.1.0, 2.10.0 > > Attachments: HDFS-12985.01.patch > > > NameNode crashes repeatedly with NPE at the startup when trying to find the > total number of under construction blocks. This crash happens after an open > file, which was also part of a snapshot gets deleted along with the snapshot. > {noformat} > Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
[ https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317352#comment-16317352 ] Manoj Govindassamy commented on HDFS-12985: --- Thanks for the review [~yzhangal]. Committed it to trunk. > NameNode crashes during restart after an OpenForWrite file present in the > Snapshot got deleted > -- > > Key: HDFS-12985 > URL: https://issues.apache.org/jira/browse/HDFS-12985 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12985.01.patch > > > NameNode crashes repeatedly with NPE at the startup when trying to find the > total number of under construction blocks. This crash happens after an open > file, which was also part of a snapshot gets deleted along with the snapshot. > {noformat} > Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11848) Enhance dfsadmin listOpenFiles command to list files under a given path
[ https://issues.apache.org/jira/browse/HDFS-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313613#comment-16313613 ] Manoj Govindassamy commented on HDFS-11848: --- Thanks for the patch revision. Looks good overall. +1, with few more nit questions below. 1. {{TestDFSAdmin:778}} since the path is "", shouldn't it list all open files and hence the validation should be against {{openFilesMap}} instead of {{openFiles1}} 1. The input paths are treated like Strings right? That is, even if the input path is not a valid path they can still filter the results. Say "/dir1/dir2/d" can filter files for both "/dir1/dir2/dir3/", "/dir1/dir2/dir4/" etc., 2. If (1) is true, is it any useful to have the input path as a regex pattern? Totally ok with me not doing this or taking it in a different jira. > Enhance dfsadmin listOpenFiles command to list files under a given path > --- > > Key: HDFS-11848 > URL: https://issues.apache.org/jira/browse/HDFS-11848 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Yiqun Lin > Attachments: HDFS-11848.001.patch, HDFS-11848.002.patch, > HDFS-11848.003.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > One more thing that would be nice here is to filter the output on a passed > path or DataNode. Usecases: An admin might already know a stale file by path > (perhaps from fsck's -openforwrite), and wants to figure out who the lease > holder is. Proposal here is add suboptions to {{listOpenFiles}} to list files > filtered by path. > {{LeaseManager#getINodeWithLeases(INodeDirectory)}} can be used to get the > open file list for any given ancestor directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11225) NameNode crashed because deleteSnapshot held FSNamesystem lock too long
[ https://issues.apache.org/jira/browse/HDFS-11225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312365#comment-16312365 ] Manoj Govindassamy commented on HDFS-11225: --- [~shashikant], Thanks for the proposal. Will take a look and get back to you. > NameNode crashed because deleteSnapshot held FSNamesystem lock too long > --- > > Key: HDFS-11225 > URL: https://issues.apache.org/jira/browse/HDFS-11225 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 > Environment: CDH5.8.2, HA >Reporter: Wei-Chiu Chuang >Assignee: Manoj Govindassamy >Priority: Critical > Labels: high-availability > Attachments: Snaphot_Deletion_Design_Proposal.pdf > > > The deleteSnapshot operation is synchronous. In certain situations this > operation may hold FSNamesystem lock for too long, bringing almost every > NameNode operation to a halt. > We have observed one incidence where it took so long that ZKFC believes the > NameNode is down. All other IPC threads were waiting to acquire FSNamesystem > lock. This specific deleteSnapshot took ~70 seconds. ZKFC has connection > timeout of 45 seconds by default, and if all IPC threads wait for > FSNamesystem lock and can't accept new incoming connection, ZKFC times out, > advances epoch and NameNode will therefore lose its active NN role and then > fail. > Relevant log: > {noformat} > Thread 154 (IPC Server handler 86 on 8020): > State: RUNNABLE > Blocked count: 2753455 > Waited count: 89201773 > Stack: > > org.apache.hadoop.hdfs.server.namenode.INode$BlocksMapUpdateInfo.addDeleteBlock(INode.java:879) > > org.apache.hadoop.hdfs.server.namenode.INodeFile.destroyAndCollectBlocks(INodeFile.java:508) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763) > > org.apache.hadoop.hdfs.server.namenode.INodeReference.destroyAndCollectBlocks(INodeReference.java:339) > > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.destroyAndCollectBlocks(INodeReference.java:606) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$ChildrenDiff.destroyDeletedList(DirectoryWithSnapshotFeature.java:119) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$ChildrenDiff.access$400(DirectoryWithSnapshotFeature.java:61) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.destroyDiffAndCollectBlocks(DirectoryWithSnapshotFeature.java:319) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.destroyDiffAndCollectBlocks(DirectoryWithSnapshotFeature.java:167) > > org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:83) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:745) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:776) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:747) > > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:747) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:776) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:747) > > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:789) > {noformat} > After the ZKFC determined NameNode was down and advanced epoch, the NN > finished deleting snapshot, and sent the edit to journal nodes, but it was > rejected because epoch was updated. See the following stacktrace: > {noformat} > 10.0.16.21:8485: IPC's epoch 17 is less than the last promised epoch 18 > at > org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:429) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:457) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:352) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:149) > at >
[jira] [Commented] (HDFS-11848) Enhance dfsadmin listOpenFiles command to list files under a given path
[ https://issues.apache.org/jira/browse/HDFS-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312361#comment-16312361 ] Manoj Govindassamy commented on HDFS-11848: --- Thanks for the patch contribution [~linyiqun]. Overall looks good to me. Here are few minor comments: 1. {{DFSAdmin:467}} To be consistent with the rest of the options, the indentation change can be restored to the old one. 2. {{DFSAdmin:935}} Any benefits of using StringUtils here? The implementation is missing trim() before the empty check. 3. {{DFSAdmin:2148}} Would this catch the case where the -path option is not provided with any path? 4. {{HDFSCommands.md:412}} (1) -path option missing. (2) "Open files list can filtered by given type or path. " should this be "Open files list will be filtered by given type and path. " 5. {{TestDFSAdmin:761}} Please add a test case to verify -path without any path arguments, and with an empty path "" 6. Nit: On few places the default value for the path is given as null and all other places it is given as OpenFilesIterator.FILTER_PATH_DEFAULT. Better if we can be consistent with the default value usage for simplicity. > Enhance dfsadmin listOpenFiles command to list files under a given path > --- > > Key: HDFS-11848 > URL: https://issues.apache.org/jira/browse/HDFS-11848 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Yiqun Lin > Attachments: HDFS-11848.001.patch, HDFS-11848.002.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > One more thing that would be nice here is to filter the output on a passed > path or DataNode. Usecases: An admin might already know a stale file by path > (perhaps from fsck's -openforwrite), and wants to figure out who the lease > holder is. Proposal here is add suboptions to {{listOpenFiles}} to list files > filtered by path. > {{LeaseManager#getINodeWithLeases(INodeDirectory)}} can be used to get the > open file list for any given ancestor directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
[ https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312158#comment-16312158 ] Manoj Govindassamy commented on HDFS-12985: --- Above unit test failures are not related to the patch. Will take care of the checkstyle issue in the next patch revision after review. > NameNode crashes during restart after an OpenForWrite file present in the > Snapshot got deleted > -- > > Key: HDFS-12985 > URL: https://issues.apache.org/jira/browse/HDFS-12985 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12985.01.patch > > > NameNode crashes repeatedly with NPE at the startup when trying to find the > total number of under construction blocks. This crash happens after an open > file, which was also part of a snapshot gets deleted along with the snapshot. > {noformat} > Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
[ https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12985: -- Status: Patch Available (was: Open) > NameNode crashes during restart after an OpenForWrite file present in the > Snapshot got deleted > -- > > Key: HDFS-12985 > URL: https://issues.apache.org/jira/browse/HDFS-12985 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12985.01.patch > > > NameNode crashes repeatedly with NPE at the startup when trying to find the > total number of under construction blocks. This crash happens after an open > file, which was also part of a snapshot gets deleted along with the snapshot. > {noformat} > Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
[ https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12985: -- Attachment: HDFS-12985.01.patch Attached v01 to address the following: 1. {{INodeFile#cleanSubtree()}} updates {{ReclaimContext#removedUCFiles}} after deleting the snapshot file. 2. {{FSDirDeleteOp#deleteInternal}} already take care of removing the leases for removedUCFiles and removedINodes. 3. New unit test {{TestOpenFilesWithSnapshot#testOpenFileDeletionAndNNRestart}} added to show the problem and the fix solving the same. > NameNode crashes during restart after an OpenForWrite file present in the > Snapshot got deleted > -- > > Key: HDFS-12985 > URL: https://issues.apache.org/jira/browse/HDFS-12985 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12985.01.patch > > > NameNode crashes repeatedly with NPE at the startup when trying to find the > total number of under construction blocks. This crash happens after an open > file, which was also part of a snapshot gets deleted along with the snapshot. > {noformat} > Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
[ https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311928#comment-16311928 ] Manoj Govindassamy edited comment on HDFS-12985 at 1/4/18 7:46 PM: --- Attached v01 to address the following: 1. {{INodeFile#cleanSubtree()}} updates {{ReclaimContext#removedUCFiles}} after deleting the snapshot file. 2. {{FSDirDeleteOp#deleteInternal}} already take care of removing the leases for removedUCFiles and removedINodes. 3. New unit test {{TestOpenFilesWithSnapshot#testOpenFileDeletionAndNNRestart}} added to show the problem and the fix solving the same. [~yzhangal], [~eddyxu], can you please take a look at the patch? was (Author: manojg): Attached v01 to address the following: 1. {{INodeFile#cleanSubtree()}} updates {{ReclaimContext#removedUCFiles}} after deleting the snapshot file. 2. {{FSDirDeleteOp#deleteInternal}} already take care of removing the leases for removedUCFiles and removedINodes. 3. New unit test {{TestOpenFilesWithSnapshot#testOpenFileDeletionAndNNRestart}} added to show the problem and the fix solving the same. > NameNode crashes during restart after an OpenForWrite file present in the > Snapshot got deleted > -- > > Key: HDFS-12985 > URL: https://issues.apache.org/jira/browse/HDFS-12985 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12985.01.patch > > > NameNode crashes repeatedly with NPE at the startup when trying to find the > total number of under construction blocks. This crash happens after an open > file, which was also part of a snapshot gets deleted along with the snapshot. > {noformat} > Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
[ https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12985: -- Description: NameNode crashes repeatedly with NPE at the startup when trying to find the total number of under construction blocks. This crash happens after an open file, which was also part of a snapshot gets deleted along with the snapshot. {noformat} Failed to start namenode. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) {noformat} was: NameNode crashes repeatedly with NPE at the startup when trying to find the total number of under construction blocks. This crash happens after an open file, which was also part of a snapshot gets deleted along with the snapshot. {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:144) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:4456) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1158) at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:825) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:751) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:968) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:947) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2110) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2075) at org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testSnapshotsForOpenFilesAndDeletion3(TestOpenFilesWithSnapshot.java:747) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} > NameNode crashes during restart after an OpenForWrite file present in the > Snapshot got deleted > -- > > Key: HDFS-12985 > URL: https://issues.apache.org/jira/browse/HDFS-12985 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > > NameNode crashes repeatedly with NPE at the startup when trying to find the > total number of under construction blocks. This crash happens after an open > file, which was also part of a snapshot gets deleted along with the snapshot. > {noformat} > Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) >
[jira] [Created] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
Manoj Govindassamy created HDFS-12985: - Summary: NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted Key: HDFS-12985 URL: https://issues.apache.org/jira/browse/HDFS-12985 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.8.0 Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy NameNode crashes repeatedly with NPE at the startup when trying to find the total number of under construction blocks. This crash happens after an open file, which was also part of a snapshot gets deleted along with the snapshot. {noformat} java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:144) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:4456) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1158) at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:825) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:751) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:968) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:947) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1674) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2110) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2075) at org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testSnapshotsForOpenFilesAndDeletion3(TestOpenFilesWithSnapshot.java:747) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11848) Enhance dfsadmin listOpenFiles command to list files under a given path
[ https://issues.apache.org/jira/browse/HDFS-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310469#comment-16310469 ] Manoj Govindassamy commented on HDFS-11848: --- Thanks for posting a patch revision [~linyiqun]. Sorry for the delay, will review this week. > Enhance dfsadmin listOpenFiles command to list files under a given path > --- > > Key: HDFS-11848 > URL: https://issues.apache.org/jira/browse/HDFS-11848 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Yiqun Lin > Attachments: HDFS-11848.001.patch, HDFS-11848.002.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > One more thing that would be nice here is to filter the output on a passed > path or DataNode. Usecases: An admin might already know a stale file by path > (perhaps from fsck's -openforwrite), and wants to figure out who the lease > holder is. Proposal here is add suboptions to {{listOpenFiles}} to list files > filtered by path. > {{LeaseManager#getINodeWithLeases(INodeDirectory)}} can be used to get the > open file list for any given ancestor directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12629) NameNode UI should report total blocks count by type - replicated and erasure coded
[ https://issues.apache.org/jira/browse/HDFS-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12629: -- Resolution: Fixed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) Thanks for the review [~eddyxu]. Committed to trunk. > NameNode UI should report total blocks count by type - replicated and erasure > coded > --- > > Key: HDFS-12629 > URL: https://issues.apache.org/jira/browse/HDFS-12629 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.1.0 > > Attachments: HDFS-12629.01.patch, HDFS-12629.02.patch, > NN_UI_Summary_BlockCount_AfterFix.png, NN_UI_Summary_BlockCount_BeforeFix.png > > > Currently NameNode UI displays total files and directories and total blocks > in the cluster under the Summary tab. But, the total blocks count split by > type is missing. It would be good if we can display total blocks counts by > type (provided by HDFS-12573) along with the total block count. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308884#comment-16308884 ] Manoj Govindassamy commented on HDFS-11847: --- Thanks for the review [~xiaochen]. Took care of the checkstyle and javadoc issues. Committed to trunk. > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, > HDFS-11847.03.patch, HDFS-11847.04.patch, HDFS-11847.05.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12629) NameNode UI should report total blocks count by type - replicated and erasure coded
[ https://issues.apache.org/jira/browse/HDFS-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12629: -- Attachment: HDFS-12629.02.patch Thanks for the review [~eddyxu]. Will commit soon to trunk. Re-attaching the same patch as v02 to overcome the HDFS precommit build issue. > NameNode UI should report total blocks count by type - replicated and erasure > coded > --- > > Key: HDFS-12629 > URL: https://issues.apache.org/jira/browse/HDFS-12629 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12629.01.patch, HDFS-12629.02.patch, > NN_UI_Summary_BlockCount_AfterFix.png, NN_UI_Summary_BlockCount_BeforeFix.png > > > Currently NameNode UI displays total files and directories and total blocks > in the cluster under the Summary tab. But, the total blocks count split by > type is missing. It would be good if we can display total blocks counts by > type (provided by HDFS-12573) along with the total block count. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11847: -- Attachment: HDFS-11847.05.patch Thanks [~xiaochen] for the review. Attached v05 patch to address the following. Please take a look at the latest patch. 1. HDFS-12969 is tracking the enhancements needed to {{dfsAdmin -listOpenFiles}} command. 2. Restored old API in the client packages. 3. {{FSN#getFilesBlockingDecom}} nows returns a batched list honoring {{maxListOpenFilesResponses}}. 4. Restored the old reporting format 5. Surprisingly I don't see this change in the IDE. Able to get this unnecessary change removed after a fresh pull. And, updated the test case to cover the batch response for listing open files by type. > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, > HDFS-11847.03.patch, HDFS-11847.04.patch, HDFS-11847.05.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12969) DfsAdmin listOpenFiles should report files by type
Manoj Govindassamy created HDFS-12969: - Summary: DfsAdmin listOpenFiles should report files by type Key: HDFS-12969 URL: https://issues.apache.org/jira/browse/HDFS-12969 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.1.0 Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy HDFS-11847 has introduced a new option to {{-blockingDecommission}} to an existing command {{dfsadmin -listOpenFiles}}. But the reporting done by the command doesn't differentiate the files based on the type (like blocking decommission). In order to change the reporting style, the proto format used for the base command has to be updated to carry additional fields and better be done in a new jira outside of HDFS-11847. This jira is to track the end-to-end enhancements needed for dfsadmin -listOpenFiles console output. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12629) NameNode UI should report total blocks count by type - replicated and erasure coded
[ https://issues.apache.org/jira/browse/HDFS-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12629: -- Status: Patch Available (was: Open) > NameNode UI should report total blocks count by type - replicated and erasure > coded > --- > > Key: HDFS-12629 > URL: https://issues.apache.org/jira/browse/HDFS-12629 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12629.01.patch, > NN_UI_Summary_BlockCount_AfterFix.png, NN_UI_Summary_BlockCount_BeforeFix.png > > > Currently NameNode UI displays total files and directories and total blocks > in the cluster under the Summary tab. But, the total blocks count split by > type is missing. It would be good if we can display total blocks counts by > type (provided by HDFS-12573) along with the total block count. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12629) NameNode UI should report total blocks count by type - replicated and erasure coded
[ https://issues.apache.org/jira/browse/HDFS-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12629: -- Attachment: HDFS-12629.01.patch Attached v01 patch to report separate block stats -- Replicated blocks and Erasure Coded block groups in the NN UI Summary page. [~eddyxu], can you please take a look at the patch? > NameNode UI should report total blocks count by type - replicated and erasure > coded > --- > > Key: HDFS-12629 > URL: https://issues.apache.org/jira/browse/HDFS-12629 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12629.01.patch, > NN_UI_Summary_BlockCount_AfterFix.png, NN_UI_Summary_BlockCount_BeforeFix.png > > > Currently NameNode UI displays total files and directories and total blocks > in the cluster under the Summary tab. But, the total blocks count split by > type is missing. It would be good if we can display total blocks counts by > type (provided by HDFS-12573) along with the total block count. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12629) NameNode UI should report total blocks count by type - replicated and erasure coded
[ https://issues.apache.org/jira/browse/HDFS-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12629: -- Attachment: NN_UI_Summary_BlockCount_AfterFix.png > NameNode UI should report total blocks count by type - replicated and erasure > coded > --- > > Key: HDFS-12629 > URL: https://issues.apache.org/jira/browse/HDFS-12629 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12629.01.patch, > NN_UI_Summary_BlockCount_AfterFix.png, NN_UI_Summary_BlockCount_BeforeFix.png > > > Currently NameNode UI displays total files and directories and total blocks > in the cluster under the Summary tab. But, the total blocks count split by > type is missing. It would be good if we can display total blocks counts by > type (provided by HDFS-12573) along with the total block count. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12959) Fix TestOpenFilesWithSnapshot redundant configurations
[ https://issues.apache.org/jira/browse/HDFS-12959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12959: -- Resolution: Fixed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) > Fix TestOpenFilesWithSnapshot redundant configurations > -- > > Key: HDFS-12959 > URL: https://issues.apache.org/jira/browse/HDFS-12959 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Minor > Fix For: 3.1.0 > > Attachments: HDFS-12959.01.patch > > > Fix the redundant configurations that are set in > {{TestOpenFilesWithSnapshot#testPointInTimeSnapshotCopiesForOpenFiles}} and > {{TestOpenFilesWithSnapshot#testOpenFilesSnapChecksumWithTrunkAndAppend}}. > These redundant configurations give an impression that they are needed for > the tests to pass through, but infact its not. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12959) Fix TestOpenFilesWithSnapshot redundant configurations
[ https://issues.apache.org/jira/browse/HDFS-12959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300734#comment-16300734 ] Manoj Govindassamy commented on HDFS-12959: --- Thanks for the review [~eddyxu]. Test failure is not related to the patch. Pushed the changes to trunk. > Fix TestOpenFilesWithSnapshot redundant configurations > -- > > Key: HDFS-12959 > URL: https://issues.apache.org/jira/browse/HDFS-12959 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Minor > Attachments: HDFS-12959.01.patch > > > Fix the redundant configurations that are set in > {{TestOpenFilesWithSnapshot#testPointInTimeSnapshotCopiesForOpenFiles}} and > {{TestOpenFilesWithSnapshot#testOpenFilesSnapChecksumWithTrunkAndAppend}}. > These redundant configurations give an impression that they are needed for > the tests to pass through, but infact its not. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12959) Fix TestOpenFilesWithSnapshot redundant configurations
[ https://issues.apache.org/jira/browse/HDFS-12959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12959: -- Status: Patch Available (was: Open) > Fix TestOpenFilesWithSnapshot redundant configurations > -- > > Key: HDFS-12959 > URL: https://issues.apache.org/jira/browse/HDFS-12959 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Minor > Attachments: HDFS-12959.01.patch > > > Fix the redundant configurations that are set in > {{TestOpenFilesWithSnapshot#testPointInTimeSnapshotCopiesForOpenFiles}} and > {{TestOpenFilesWithSnapshot#testOpenFilesSnapChecksumWithTrunkAndAppend}}. > These redundant configurations give an impression that they are needed for > the tests to pass through, but infact its not. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12959) Fix TestOpenFilesWithSnapshot redundant configurations
[ https://issues.apache.org/jira/browse/HDFS-12959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12959: -- Attachment: HDFS-12959.01.patch Attached v01 patch to remove the redundant configurations in TestOpenFilesWithSnapshot. [~eddyxu], can you please take a look at the patch? > Fix TestOpenFilesWithSnapshot redundant configurations > -- > > Key: HDFS-12959 > URL: https://issues.apache.org/jira/browse/HDFS-12959 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Minor > Attachments: HDFS-12959.01.patch > > > Fix the redundant configurations that are set in > {{TestOpenFilesWithSnapshot#testPointInTimeSnapshotCopiesForOpenFiles}} and > {{TestOpenFilesWithSnapshot#testOpenFilesSnapChecksumWithTrunkAndAppend}}. > These redundant configurations give an impression that they are needed for > the tests to pass through, but infact its not. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12959) Fix TestOpenFilesWithSnapshot redundant configurations
Manoj Govindassamy created HDFS-12959: - Summary: Fix TestOpenFilesWithSnapshot redundant configurations Key: HDFS-12959 URL: https://issues.apache.org/jira/browse/HDFS-12959 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.0.0 Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy Priority: Minor Fix the redundant configurations that are set in {{TestOpenFilesWithSnapshot#testPointInTimeSnapshotCopiesForOpenFiles}} and {{TestOpenFilesWithSnapshot#testOpenFilesSnapChecksumWithTrunkAndAppend}}. These redundant configurations give an impression that they are needed for the tests to pass through, but infact its not. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11847: -- Attachment: HDFS-11847.04.patch Attached v04 patch to address the TestAnnotations failure in the previous test run. Other test failures are not related to the patch. > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, > HDFS-11847.03.patch, HDFS-11847.04.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12953) XORRawDecoder.doDecode throws NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy reassigned HDFS-12953: - Assignee: Manoj Govindassamy (was: Lei (Eddy) Xu) > XORRawDecoder.doDecode throws NullPointerException > -- > > Key: HDFS-12953 > URL: https://issues.apache.org/jira/browse/HDFS-12953 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Manoj Govindassamy > > Thanks [~danielpol] report on HDFS-12860. > {noformat} > 17/11/30 04:19:55 INFO mapreduce.Job: map 0% reduce 0% > 17/11/30 04:20:01 INFO mapreduce.Job: Task Id : > attempt_1512036058655_0003_m_02_0, Status : FAILED > Error: java.lang.NullPointerException > at > org.apache.hadoop.io.erasurecode.rawcoder.XORRawDecoder.doDecode(XORRawDecoder.java:83) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:106) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) > at > org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:423) > at > org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:382) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:318) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:391) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:813) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:257) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:563) > at > org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11847: -- Attachment: HDFS-11847.03.patch > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, > HDFS-11847.03.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11847: -- Attachment: (was: HDFS-11847.03.patch) > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11847: -- Attachment: HDFS-11847.03.patch Thanks for the detailed review [~xiaochen]. Attached v03 patch to address the following. Please take a look. 1. Deprecated the old API and added a new one which accepts the additional type argument to filter the result by. 2. Updated {{FSN#listOpenFiles}} to check for {{ALL_OPEN_FILES}} type first and then for combination filtering option later. But, the result set and the reporting doesn't differentiate the entries by type. For this, we need to add the type to the {{OpenFileEntry}}. Will do this. 3. About printing DataNodes details in the results, planning to take this enhancement along with the pending item in (2) in a separate jira if you are ok. I need to change the proto, the handling of the 4. Yes, better to return as much result as possible. Made {{DatanodeAdminManager#processBlocksInternal}} to log the warning message on unexpected open files and continue to the next one. 5. In {{DatanodeAdminManager#processBlocksInternal}}, the computation is at the DataNode level. There can be multiple blocks across DNs for the same file and the full count need to be tracked for JMX reporting purposes. So, retaining the existing lowRedundancyBlocksInOpenFiles field. When I removed this field and piggy backed on the {{lowRedundancyOpenFiles.size()}}, the actual count was lesser than the expected for few tests. 6. In {{LeavingServiceStatus}} both members are needed due to (5) 7. Updated the comment for the class {{LeavingServiceStatus}} 8. {{FSN#getFilesBlockingDecom}} added hasReadLock() 9. {{TestDecommission#verifyOpenFilesBlockingDecommission}} PrintStream is now copied before the exchange and restored to the old one. Good find. 10. {{DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY}} value is actually in seconds. So, it is 1000 seconds and not 1 sec. Anyways, updated this to Max value. > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, > HDFS-11847.03.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11847: -- Attachment: HDFS-11847.02.patch Attached v02 patch with more unit tests added. > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11847: -- Status: Patch Available (was: Open) > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11847: -- Attachment: HDFS-11847.01.patch Attached v01 patch to address the following: 1. Ability to query for open files with interested types - like BLOCKING_DECOMMISSION, BLOCKING_DECOMMISSION, ALL, etc., 2. A new method FSNamesystem#getFilesBlockingDecom() to get the list of all open files blocking deocmmissiom 3. Basic tests. More sophisticated tests pending. > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12918) NameNode fails to start after upgrade - Missing state in ECPolicy Proto
[ https://issues.apache.org/jira/browse/HDFS-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288591#comment-16288591 ] Manoj Govindassamy edited comment on HDFS-12918 at 12/13/17 5:46 AM: - We have an upgrade incompatible fix landed in 3.0 at e565b5277d5b890dad107fe85e295a3907e4bfc1. The fix is necessary and it verifies the EC Policy state when loading FSImage. This issue is nothing to do with the default value for the ECPolicyState field in the ErasureCodingPolicyProto. While the ECPolicyState field is optional in ECPolocyProto message for over the wire communications, but its mandatory in FSImage for the EC files. I hope the upgrade incompatible changes before the C6 GA are ok. Please let me know if you have other thoughts. was (Author: manojg): We have an upgrade incompatible fix landed in C6 at e565b5277d5b890dad107fe85e295a3907e4bfc1. The fix is necessary and it verifies the EC Policy state when loading FSImage. This issue is nothing to do with the default value for the ECPolicyState field in the ErasureCodingPolicyProto. While the ECPolicyState field is optional in ECPolocyProto message for over the wire communications, but its mandatory in FSImage for the EC files. I hope the upgrade incompatible changes before the C6 GA are ok. Please let me know if you have other thoughts. > NameNode fails to start after upgrade - Missing state in ECPolicy Proto > > > Key: HDFS-12918 > URL: https://issues.apache.org/jira/browse/HDFS-12918 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Zach Amsden >Assignee: Manoj Govindassamy >Priority: Critical > > According to documentation and code comments, the default setting for erasure > coding policy is disabled: > /** Policy is disabled. It's policy default state. */ > DISABLED(1), > However, HDFS-12258 appears to have incorrectly set the policy state in the > protobuf to enabled: > {code:java} > message ErasureCodingPolicyProto { > ooptional string name = 1; > optional ECSchemaProto schema = 2; > optional uint32 cellSize = 3; > required uint32 id = 4; // Actually a byte - only 8 bits used > + optional ErasureCodingPolicyState state = 5 [default = ENABLED]; > } > {code} > This means the parameter can't actually be optional, it must always be > included, and existing serialized data without this optional field will be > incorrectly interpreted as having erasure coding enabled. > This unnecessarily breaks compatibility and will require existing HDFS > installations that store metadata in protobufs to require reformatting. > It looks like a simple mistake that was overlooked in code review. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12918) NameNode fails to start after upgrade - Missing state in ECPolicy Proto
[ https://issues.apache.org/jira/browse/HDFS-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy resolved HDFS-12918. --- Resolution: Won't Fix We have an upgrade incompatible fix landed in C6 at e565b5277d5b890dad107fe85e295a3907e4bfc1. The fix is necessary and it verifies the EC Policy state when loading FSImage. This issue is nothing to do with the default value for the ECPolicyState field in the ErasureCodingPolicyProto. While the ECPolicyState field is optional in ECPolocyProto message for over the wire communications, but its mandatory in FSImage for the EC files. I hope the upgrade incompatible changes before the C6 GA are ok. Please let me know if you have other thoughts. > NameNode fails to start after upgrade - Missing state in ECPolicy Proto > > > Key: HDFS-12918 > URL: https://issues.apache.org/jira/browse/HDFS-12918 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Zach Amsden >Assignee: Manoj Govindassamy >Priority: Critical > > According to documentation and code comments, the default setting for erasure > coding policy is disabled: > /** Policy is disabled. It's policy default state. */ > DISABLED(1), > However, HDFS-12258 appears to have incorrectly set the policy state in the > protobuf to enabled: > {code:java} > message ErasureCodingPolicyProto { > ooptional string name = 1; > optional ECSchemaProto schema = 2; > optional uint32 cellSize = 3; > required uint32 id = 4; // Actually a byte - only 8 bits used > + optional ErasureCodingPolicyState state = 5 [default = ENABLED]; > } > {code} > This means the parameter can't actually be optional, it must always be > included, and existing serialized data without this optional field will be > incorrectly interpreted as having erasure coding enabled. > This unnecessarily breaks compatibility and will require existing HDFS > installations that store metadata in protobufs to require reformatting. > It looks like a simple mistake that was overlooked in code review. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12918) NameNode fails to start after upgrade - Missing state in ECPolicy Proto
[ https://issues.apache.org/jira/browse/HDFS-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12918: -- Affects Version/s: 3.0.0-beta1 Component/s: hdfs > NameNode fails to start after upgrade - Missing state in ECPolicy Proto > > > Key: HDFS-12918 > URL: https://issues.apache.org/jira/browse/HDFS-12918 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Zach Amsden >Assignee: Manoj Govindassamy >Priority: Critical > > According to documentation and code comments, the default setting for erasure > coding policy is disabled: > /** Policy is disabled. It's policy default state. */ > DISABLED(1), > However, HDFS-12258 appears to have incorrectly set the policy state in the > protobuf to enabled: > {code:java} > message ErasureCodingPolicyProto { > ooptional string name = 1; > optional ECSchemaProto schema = 2; > optional uint32 cellSize = 3; > required uint32 id = 4; // Actually a byte - only 8 bits used > + optional ErasureCodingPolicyState state = 5 [default = ENABLED]; > } > {code} > This means the parameter can't actually be optional, it must always be > included, and existing serialized data without this optional field will be > incorrectly interpreted as having erasure coding enabled. > This unnecessarily breaks compatibility and will require existing HDFS > installations that store metadata in protobufs to require reformatting. > It looks like a simple mistake that was overlooked in code review. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12918) NameNode fails to start after upgrade - Missing state in ECPolicy Proto
[ https://issues.apache.org/jira/browse/HDFS-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12918: -- Summary: NameNode fails to start after upgrade - Missing state in ECPolicy Proto (was: EC Policy defaults incorrectly to enabled in protobufs) > NameNode fails to start after upgrade - Missing state in ECPolicy Proto > > > Key: HDFS-12918 > URL: https://issues.apache.org/jira/browse/HDFS-12918 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zach Amsden >Assignee: Manoj Govindassamy >Priority: Critical > > According to documentation and code comments, the default setting for erasure > coding policy is disabled: > /** Policy is disabled. It's policy default state. */ > DISABLED(1), > However, HDFS-12258 appears to have incorrectly set the policy state in the > protobuf to enabled: > {code:java} > message ErasureCodingPolicyProto { > ooptional string name = 1; > optional ECSchemaProto schema = 2; > optional uint32 cellSize = 3; > required uint32 id = 4; // Actually a byte - only 8 bits used > + optional ErasureCodingPolicyState state = 5 [default = ENABLED]; > } > {code} > This means the parameter can't actually be optional, it must always be > included, and existing serialized data without this optional field will be > incorrectly interpreted as having erasure coding enabled. > This unnecessarily breaks compatibility and will require existing HDFS > installations that store metadata in protobufs to require reformatting. > It looks like a simple mistake that was overlooked in code review. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12918) EC Policy defaults incorrectly to enabled in protobufs
[ https://issues.apache.org/jira/browse/HDFS-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288521#comment-16288521 ] Manoj Govindassamy commented on HDFS-12918: --- A new check added in the convert seems to be not backward compatible. It is going to break the upgrade from previous image format where the ErasureCodingPolicyProto didn't have state field. It is suppose to be an optional field and the below check need to be relaxed as well. [~xiaochen] your thoughts please? {noformat} /** * Convert the protobuf to a {@link ErasureCodingPolicyInfo}. This should only * be needed when the caller is interested in the state of the policy. */ public static ErasureCodingPolicyInfo convertErasureCodingPolicyInfo( ErasureCodingPolicyProto proto) { ErasureCodingPolicy policy = convertErasureCodingPolicy(proto); ErasureCodingPolicyInfo info = new ErasureCodingPolicyInfo(policy); Preconditions.checkArgument(proto.hasState(),<== "Missing state field in ErasureCodingPolicy proto"); info.setState(convertECState(proto.getState())); return info; } {noformat} > EC Policy defaults incorrectly to enabled in protobufs > -- > > Key: HDFS-12918 > URL: https://issues.apache.org/jira/browse/HDFS-12918 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zach Amsden >Assignee: Manoj Govindassamy >Priority: Critical > > According to documentation and code comments, the default setting for erasure > coding policy is disabled: > /** Policy is disabled. It's policy default state. */ > DISABLED(1), > However, HDFS-12258 appears to have incorrectly set the policy state in the > protobuf to enabled: > {code:java} > message ErasureCodingPolicyProto { > ooptional string name = 1; > optional ECSchemaProto schema = 2; > optional uint32 cellSize = 3; > required uint32 id = 4; // Actually a byte - only 8 bits used > + optional ErasureCodingPolicyState state = 5 [default = ENABLED]; > } > {code} > This means the parameter can't actually be optional, it must always be > included, and existing serialized data without this optional field will be > incorrectly interpreted as having erasure coding enabled. > This unnecessarily breaks compatibility and will require existing HDFS > installations that store metadata in protobufs to require reformatting. > It looks like a simple mistake that was overlooked in code review. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-5926) Documentation should clarify dfs.datanode.du.reserved impact from reserved disk capacity
[ https://issues.apache.org/jira/browse/HDFS-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-5926: - Summary: Documentation should clarify dfs.datanode.du.reserved impact from reserved disk capacity (was: documation should clarify dfs.datanode.du.reserved wrt reserved disk capacity) > Documentation should clarify dfs.datanode.du.reserved impact from reserved > disk capacity > > > Key: HDFS-5926 > URL: https://issues.apache.org/jira/browse/HDFS-5926 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 0.20.2 >Reporter: Alexander Fahlke >Assignee: Gabor Bota >Priority: Minor > Labels: newbie > Attachments: HDFS-5926-1.patch > > > I'm using hadoop-0.20.2 on Debian Squeeze and ran into the same confusion as > many others with the parameter for dfs.datanode.du.reserved. One day some > data nodes got out of disk errors although there was space left on the disks. > The following values are rounded to make the problem more clear: > - the disk for the DFS data has 1000GB and only one Partition (ext3) for DFS > data > - you plan to set the dfs.datanode.du.reserved to 20GB > - the reserved reserved-blocks-percentage by tune2fs is 5% (the default) > That gives all users, except root, 5% less capacity that they can use. > Although the System reports the total of 1000GB as usable for all users via > df. The hadoop-deamons are not running as root. > If i read it right, than hadoop get's the free capacity via df. > > Starting in > {{/src/hdfs/org/apache/hadoop/hdfs/server/datanode/FSDataset.java}} on line > 350: {{return usage.getCapacity()-reserved;}} > going to {{/src/core/org/apache/hadoop/fs/DF.java}} which says: > {{"Filesystem disk space usage statistics. Uses the unix 'df' program"}} > When you have 5% reserved by tune2fs (in our case 50GB) and you give > dfs.datanode.du.reserved only 20GB, than you can possibly ran into out of > disk errors that hadoop can't handle. > In this case you must add the planned 20GB du reserved to the reserved > capacity by tune2fs. This results in (at least) 70GB for > dfs.datanode.du.reserved in my case. > Two ideas: > # The documentation must be clear at this point to avoid this problem. > # Hadoop could check for reserved space by tune2fs (or other tools) and add > this value to the dfs.datanode.du.reserved parameter. > This ticket is a follow up from the Mailinglist: > https://mail-archives.apache.org/mod_mbox/hadoop-common-user/201312.mbox/%3CCAHodO=Kbv=13T=2otz+s8nsodbs1icnzqyxt_0wdfxy5gks...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12918) EC Policy defaults incorrectly to enabled in protobufs
[ https://issues.apache.org/jira/browse/HDFS-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288378#comment-16288378 ] Manoj Govindassamy commented on HDFS-12918: --- [~zamsden], There is an addendum patch HDFS-12682 after HDFS-12258 to make the policy immutable by pulling the EC state to {{ErasureCodingPolicyInfo}}. As you pointed out {{hdfs.proto}} default value looks wrong to me as well. But, in the PBHelperClient code there is an explicit handling for this, both while saving the ECPolicy and while retrieving. So, ECPI saved and retrieved from FSImages should be right. {{PBHelperClient}} {noformat} /** * Convert the protobuf to a {@link ErasureCodingPolicyInfo}. This should only * be needed when the caller is interested in the state of the policy. */ public static ErasureCodingPolicyInfo convertErasureCodingPolicyInfo( ErasureCodingPolicyProto proto) { ErasureCodingPolicy policy = convertErasureCodingPolicy(proto); ErasureCodingPolicyInfo info = new ErasureCodingPolicyInfo(policy); Preconditions.checkArgument(proto.hasState(), "Missing state field in ErasureCodingPolicy proto"); info.setState(convertECState(proto.getState())); < return info; } /** * Convert a {@link ErasureCodingPolicyInfo} to protobuf. * The protobuf will have the policy, and state. State is relevant when: * 1. Persisting a policy to fsimage * 2. Returning the policy to the RPC call * {@link DistributedFileSystem#getAllErasureCodingPolicies()} */ public static ErasureCodingPolicyProto convertErasureCodingPolicy( ErasureCodingPolicyInfo info) { final ErasureCodingPolicyProto.Builder builder = createECPolicyProtoBuilder(info.getPolicy()); builder.setState(convertECState(info.getState())); <=== return builder.build(); } {noformat} Listing Policies: {noformat} $ hdfs ec -listPolicies Erasure Coding Policies: ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=ENABLED ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=ENABLED ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=3], State=DISABLED ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=DISABLED {noformat} But, there is another version of {{convertErasureCodingPolicy}} which takes in only {{ErasureCodingPolicy}} where the state is missing and the default state from {{ErasureCodingPolicyProto}} will be used. {noformat} /** * Convert a {@link ErasureCodingPolicy} to protobuf. * This means no state of the policy will be set on the protobuf. */ public static ErasureCodingPolicyProto convertErasureCodingPolicy( ErasureCodingPolicy policy) { return createECPolicyProtoBuilder(policy).build(); } {noformat} Probably you are seeing the default value of the EC state from the callers (like ListStatus, BlockRecovery, BlockGroupChecksum etc.,) of the above convert util. Can you please confirm where you are seeing the inconsistent EC state? > EC Policy defaults incorrectly to enabled in protobufs > -- > > Key: HDFS-12918 > URL: https://issues.apache.org/jira/browse/HDFS-12918 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zach Amsden >Assignee: Manoj Govindassamy >Priority: Critical > > According to documentation and code comments, the default setting for erasure > coding policy is disabled: > /** Policy is disabled. It's policy default state. */ > DISABLED(1), > However, HDFS-12258 appears to have incorrectly set the policy state in the > protobuf to enabled: > {code:java} > message ErasureCodingPolicyProto { > ooptional string name = 1; > optional ECSchemaProto schema = 2; > optional uint32 cellSize = 3; > required uint32 id = 4; // Actually a byte - only 8 bits used > + optional ErasureCodingPolicyState state = 5 [default = ENABLED]; > } > {code} > This means the parameter can't actually be optional, it must always be > included, and existing serialized data without this optional field will be > incorrectly interpreted as having erasure coding enabled. > This unnecessarily breaks compatibility and will require existing HDFS > installations that store metadata in protobufs to require reformatting. > It looks like a simple mistake that was overlooked in code review. -- This message was sent by Atlassian JIRA
[jira] [Assigned] (HDFS-12918) EC Policy defaults incorrectly to enabled in protobufs
[ https://issues.apache.org/jira/browse/HDFS-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy reassigned HDFS-12918: - Assignee: Manoj Govindassamy I can take a look at this if you haven't already started to work on the patch. Please let me know. > EC Policy defaults incorrectly to enabled in protobufs > -- > > Key: HDFS-12918 > URL: https://issues.apache.org/jira/browse/HDFS-12918 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zach Amsden >Assignee: Manoj Govindassamy >Priority: Critical > > According to documentation and code comments, the default setting for erasure > coding policy is disabled: > /** Policy is disabled. It's policy default state. */ > DISABLED(1), > However, HDFS-12258 appears to have incorrectly set the policy state in the > protobuf to enabled: > {code:java} > message ErasureCodingPolicyProto { > ooptional string name = 1; > optional ECSchemaProto schema = 2; > optional uint32 cellSize = 3; > required uint32 id = 4; // Actually a byte - only 8 bits used > + optional ErasureCodingPolicyState state = 5 [default = ENABLED]; > } > {code} > This means the parameter can't actually be optional, it must always be > included, and existing serialized data without this optional field will be > incorrectly interpreted as having erasure coding enabled. > This unnecessarily breaks compatibility and will require existing HDFS > installations that store metadata in protobufs to require reformatting. > It looks like a simple mistake that was overlooked in code review. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12855) Fsck violates namesystem locking
[ https://issues.apache.org/jira/browse/HDFS-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy reassigned HDFS-12855: - Assignee: Manoj Govindassamy > Fsck violates namesystem locking > - > > Key: HDFS-12855 > URL: https://issues.apache.org/jira/browse/HDFS-12855 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Manoj Govindassamy > > {{NamenodeFsck}} access {{FSNamesystem}} structures, such as INodes, > BlockInfo without holding a lock. See e.g. {{NamenodeFsck.blockIdCK()}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12825) Fsck report shows config key name for min replication issues
[ https://issues.apache.org/jira/browse/HDFS-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284172#comment-16284172 ] Manoj Govindassamy commented on HDFS-12825: --- Thanks for the contribution [~gabor.bota]. Committed to trunk. > Fsck report shows config key name for min replication issues > > > Key: HDFS-12825 > URL: https://issues.apache.org/jira/browse/HDFS-12825 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Harshakiran Reddy >Assignee: Gabor Bota >Priority: Minor > Labels: newbie > Fix For: 3.1.0 > > Attachments: HDFS-12825.001.patch, error.JPG > > > Scenario: > Corrupt the Block in any datanode > Take the *FSCK *Report for that file. > Actual Output: > == > printing the direct configuration in fsck report > {{dfs.namenode.replication.min}} > Expected Output: > > it should be {{MINIMAL BLOCK REPLICATION}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12825) Fsck report shows config key name for min replication issues
[ https://issues.apache.org/jira/browse/HDFS-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12825: -- Resolution: Fixed Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) > Fsck report shows config key name for min replication issues > > > Key: HDFS-12825 > URL: https://issues.apache.org/jira/browse/HDFS-12825 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Harshakiran Reddy >Assignee: Gabor Bota >Priority: Minor > Labels: newbie > Fix For: 3.1.0 > > Attachments: HDFS-12825.001.patch, error.JPG > > > Scenario: > Corrupt the Block in any datanode > Take the *FSCK *Report for that file. > Actual Output: > == > printing the direct configuration in fsck report > {{dfs.namenode.replication.min}} > Expected Output: > > it should be {{MINIMAL BLOCK REPLICATION}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12825) Fsck report shows config key name for min replication issues
[ https://issues.apache.org/jira/browse/HDFS-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12825: -- Summary: Fsck report shows config key name for min replication issues (was: After Block Corrupted, FSCK Report printing the Direct configuration. ) > Fsck report shows config key name for min replication issues > > > Key: HDFS-12825 > URL: https://issues.apache.org/jira/browse/HDFS-12825 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Harshakiran Reddy >Assignee: Gabor Bota >Priority: Minor > Labels: newbie > Attachments: HDFS-12825.001.patch, error.JPG > > > Scenario: > Corrupt the Block in any datanode > Take the *FSCK *Report for that file. > Actual Output: > == > printing the direct configuration in fsck report > {{dfs.namenode.replication.min}} > Expected Output: > > it should be {{MINIMAL BLOCK REPLICATION}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12825) Fsck report shows config key name for min replication issues
[ https://issues.apache.org/jira/browse/HDFS-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12825: -- Labels: incompatibleChange newbie (was: newbie) > Fsck report shows config key name for min replication issues > > > Key: HDFS-12825 > URL: https://issues.apache.org/jira/browse/HDFS-12825 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Harshakiran Reddy >Assignee: Gabor Bota >Priority: Minor > Labels: incompatibleChange, newbie > Attachments: HDFS-12825.001.patch, error.JPG > > > Scenario: > Corrupt the Block in any datanode > Take the *FSCK *Report for that file. > Actual Output: > == > printing the direct configuration in fsck report > {{dfs.namenode.replication.min}} > Expected Output: > > it should be {{MINIMAL BLOCK REPLICATION}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12825) Fsck report shows config key name for min replication issues
[ https://issues.apache.org/jira/browse/HDFS-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12825: -- Labels: newbie (was: incompatibleChange newbie) Hadoop Flags: Incompatible change > Fsck report shows config key name for min replication issues > > > Key: HDFS-12825 > URL: https://issues.apache.org/jira/browse/HDFS-12825 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Harshakiran Reddy >Assignee: Gabor Bota >Priority: Minor > Labels: newbie > Attachments: HDFS-12825.001.patch, error.JPG > > > Scenario: > Corrupt the Block in any datanode > Take the *FSCK *Report for that file. > Actual Output: > == > printing the direct configuration in fsck report > {{dfs.namenode.replication.min}} > Expected Output: > > it should be {{MINIMAL BLOCK REPLICATION}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12825) After Block Corrupted, FSCK Report printing the Direct configuration.
[ https://issues.apache.org/jira/browse/HDFS-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282239#comment-16282239 ] Manoj Govindassamy commented on HDFS-12825: --- Patch looks good to me. +1. Thanks for working on this [~gabor.bota] and thanks for reporting [~Harsha1206], [~usharani]. [~gabor.bota], I would prefer labelling this jira as Incompatible change since it changes the fsck output format. > After Block Corrupted, FSCK Report printing the Direct configuration. > --- > > Key: HDFS-12825 > URL: https://issues.apache.org/jira/browse/HDFS-12825 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Harshakiran Reddy >Assignee: Gabor Bota >Priority: Minor > Labels: newbie > Attachments: HDFS-12825.001.patch, error.JPG > > > Scenario: > Corrupt the Block in any datanode > Take the *FSCK *Report for that file. > Actual Output: > == > printing the direct configuration in fsck report > {{dfs.namenode.replication.min}} > Expected Output: > > it should be {{MINIMAL BLOCK REPLICATION}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12855) Fsck violates namesystem locking
[ https://issues.apache.org/jira/browse/HDFS-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269303#comment-16269303 ] Manoj Govindassamy commented on HDFS-12855: --- The latest trunk code is also similar to 2.7.x line. {{NamenodeFsck.blockIdCK()}} works on BlockManager and FSNameSystem layer directly without holding NameSystem locks. One race I can think of is fsck with block id option running in parallel with a file deletion which contains the same block. Since the BlockInfo is obtained without holding a lock, the file could get deleted and later the INode retrieval could return null and could face NPE when accessing INode members. Haven't proved this with a test yet though. {noformat} public void blockIdCK(String blockId) { ... try { //get blockInfo Block block = new Block(Block.getBlockId(blockId)); //find which file this block belongs to BlockInfo blockInfo = blockManager.getStoredBlock(block); if(blockInfo == null) { out.println("Block "+ blockId +" " + NONEXISTENT_STATUS); LOG.warn("Block "+ blockId + " " + NONEXISTENT_STATUS); return; } final INodeFile iNode = namenode.getNamesystem().getBlockCollection(blockInfo); NumberReplicas numberReplicas= blockManager.countNodes(blockInfo); out.println("Block Id: " + blockId); out.println("Block belongs to: "+iNode.getFullPathName()); out.println("No. of Expected Replica: " + blockManager.getExpectedRedundancyNum(blockInfo)); {noformat} > Fsck violates namesystem locking > - > > Key: HDFS-12855 > URL: https://issues.apache.org/jira/browse/HDFS-12855 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko > > {{NamenodeFsck}} access {{FSNamesystem}} structures, such as INodes, > BlockInfo without holding a lock. See e.g. {{NamenodeFsck.blockIdCK()}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
[ https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12730: -- Resolution: Fixed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) Thanks for the review [~yzhangal] and [~hanishakoneru]. Committed it to trunk. > Verify open files captured in the snapshots across config disable and enable > > > Key: HDFS-12730 > URL: https://issues.apache.org/jira/browse/HDFS-12730 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.1.0 > > Attachments: HDFS-12730.01.patch, HDFS-12730.02.patch > > > Open files captured in the snapshots have their meta data preserved based on > the config > _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the > upgrade scenario or when the NameNode gets restarted with config turned on or > off, the attributes of the open files captured in the snapshots are > influenced accordingly. Better to have a test case to verify open file > attributes across config turn on and off, and the current expected behavior > with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
[ https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257326#comment-16257326 ] Manoj Govindassamy commented on HDFS-12730: --- Test failures are not related to the patch. > Verify open files captured in the snapshots across config disable and enable > > > Key: HDFS-12730 > URL: https://issues.apache.org/jira/browse/HDFS-12730 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12730.01.patch, HDFS-12730.02.patch > > > Open files captured in the snapshots have their meta data preserved based on > the config > _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the > upgrade scenario or when the NameNode gets restarted with config turned on or > off, the attributes of the open files captured in the snapshots are > influenced accordingly. Better to have a test case to verify open file > attributes across config turn on and off, and the current expected behavior > with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12823) Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to branch-2.7
[ https://issues.apache.org/jira/browse/HDFS-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256324#comment-16256324 ] Manoj Govindassamy commented on HDFS-12823: --- v02 LGTM, +1. Thanks [~xkrogen]. > Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to > branch-2.7 > > > Key: HDFS-12823 > URL: https://issues.apache.org/jira/browse/HDFS-12823 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, hdfs-client >Reporter: Erik Krogen >Assignee: Erik Krogen > Attachments: HDFS-12823-branch-2.7.000.patch, > HDFS-12823-branch-2.7.001.patch, HDFS-12823-branch-2.7.002.patch > > > Given the pretty significant performance implications of HDFS-9259 (see > discussion in HDFS-10326) when doing transfers across high latency links, it > would be helpful to have this configurability exist in the 2.7 series. > Opening a new JIRA since the original HDFS-9259 has been closed for a while > and there are conflicts due to a few classes moving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12823) Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to branch-2.7
[ https://issues.apache.org/jira/browse/HDFS-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256137#comment-16256137 ] Manoj Govindassamy commented on HDFS-12823: --- Thanks for the extra efforts [~xkrogen]. Much appreciated. +1, pending Jenkins. > Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to > branch-2.7 > > > Key: HDFS-12823 > URL: https://issues.apache.org/jira/browse/HDFS-12823 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, hdfs-client >Reporter: Erik Krogen >Assignee: Erik Krogen > Attachments: HDFS-12823-branch-2.7.000.patch, > HDFS-12823-branch-2.7.001.patch > > > Given the pretty significant performance implications of HDFS-9259 (see > discussion in HDFS-10326) when doing transfers across high latency links, it > would be helpful to have this configurability exist in the 2.7 series. > Opening a new JIRA since the original HDFS-9259 has been closed for a while > and there are conflicts due to a few classes moving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12823) Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to branch-2.7
[ https://issues.apache.org/jira/browse/HDFS-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256043#comment-16256043 ] Manoj Govindassamy commented on HDFS-12823: --- [~xkrogen], Yes, not a good idea to introduce getters and setters for all those 50+ fields as part of this jira. Adding a getter for the newly added ones will be better though. Otherwise, the v0 patch LGTM, +1. Thanks for working on this. > Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to > branch-2.7 > > > Key: HDFS-12823 > URL: https://issues.apache.org/jira/browse/HDFS-12823 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, hdfs-client >Reporter: Erik Krogen >Assignee: Erik Krogen > Attachments: HDFS-12823-branch-2.7.000.patch > > > Given the pretty significant performance implications of HDFS-9259 (see > discussion in HDFS-10326) when doing transfers across high latency links, it > would be helpful to have this configurability exist in the 2.7 series. > Opening a new JIRA since the original HDFS-9259 has been closed for a while > and there are conflicts due to a few classes moving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12823) Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to branch-2.7
[ https://issues.apache.org/jira/browse/HDFS-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255937#comment-16255937 ] Manoj Govindassamy commented on HDFS-12823: --- [~xkrogen], Can we please make use of {{getSocketSendBufferSize()}} instead of directly referring to the member variable in the below check in {{DFSOutputStream}}? {noformat} 1704if (client.getConf().socketSendBufferSize > 0) { 1705 sock.setSendBufferSize(client.getConf().socketSendBufferSize); 1706} {noformat} > Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to > branch-2.7 > > > Key: HDFS-12823 > URL: https://issues.apache.org/jira/browse/HDFS-12823 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, hdfs-client >Reporter: Erik Krogen >Assignee: Erik Krogen > Attachments: HDFS-12823-branch-2.7.000.patch > > > Given the pretty significant performance implications of HDFS-9259 (see > discussion in HDFS-10326) when doing transfers across high latency links, it > would be helpful to have this configurability exist in the 2.7 series. > Opening a new JIRA since the original HDFS-9259 has been closed for a while > and there are conflicts due to a few classes moving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
[ https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12730: -- Attachment: HDFS-12730.02.patch Attached v02 patch to address the comment. -- added a case to verify the config switched on to off and the effect of file lengths for the open files in the newly taken snapshots. [~yzhangal], [~hanishakoneru], can you please take a look? > Verify open files captured in the snapshots across config disable and enable > > > Key: HDFS-12730 > URL: https://issues.apache.org/jira/browse/HDFS-12730 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12730.01.patch, HDFS-12730.02.patch > > > Open files captured in the snapshots have their meta data preserved based on > the config > _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the > upgrade scenario or when the NameNode gets restarted with config turned on or > off, the attributes of the open files captured in the snapshots are > influenced accordingly. Better to have a test case to verify open file > attributes across config turn on and off, and the current expected behavior > with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
[ https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254642#comment-16254642 ] Manoj Govindassamy edited comment on HDFS-12730 at 11/16/17 2:16 AM: - Thanks for the review [~hanishakoneru]. Thats right, after the config change and after a fresh meta data change all the previously opened files will turn immutable. [~yzhangal], [~eddyxu] can you also please take a look at the patch? was (Author: manojg): Thanks for the review [~hanishakoneru]. Thats right, after the config change and after a fresh meta data change all the previously opened files will turn immutable. [~yzhangal], can you also please take a look at the patch? > Verify open files captured in the snapshots across config disable and enable > > > Key: HDFS-12730 > URL: https://issues.apache.org/jira/browse/HDFS-12730 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12730.01.patch > > > Open files captured in the snapshots have their meta data preserved based on > the config > _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the > upgrade scenario or when the NameNode gets restarted with config turned on or > off, the attributes of the open files captured in the snapshots are > influenced accordingly. Better to have a test case to verify open file > attributes across config turn on and off, and the current expected behavior > with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
[ https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12730: -- Status: Patch Available (was: Open) > Verify open files captured in the snapshots across config disable and enable > > > Key: HDFS-12730 > URL: https://issues.apache.org/jira/browse/HDFS-12730 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12730.01.patch > > > Open files captured in the snapshots have their meta data preserved based on > the config > _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the > upgrade scenario or when the NameNode gets restarted with config turned on or > off, the attributes of the open files captured in the snapshots are > influenced accordingly. Better to have a test case to verify open file > attributes across config turn on and off, and the current expected behavior > with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
[ https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12730: -- Attachment: HDFS-12730.01.patch Attached v01 patch to verify the attributes of the open files captured in snapshots with/without config. [~yzhangal], can you please take a look at the patch? > Verify open files captured in the snapshots across config disable and enable > > > Key: HDFS-12730 > URL: https://issues.apache.org/jira/browse/HDFS-12730 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12730.01.patch > > > Open files captured in the snapshots have their meta data preserved based on > the config > _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the > upgrade scenario or when the NameNode gets restarted with config turned on or > off, the attributes of the open files captured in the snapshots are > influenced accordingly. Better to have a test case to verify open file > attributes across config turn on and off, and the current expected behavior > with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
[ https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12730: -- Description: Open files captured in the snapshots have their meta data preserved based on the config _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the upgrade scenario or when the NameNode gets restarted with config turned on or off, the attributes of the open files captured in the snapshots are influenced accordingly. Better to have a test case to verify open file attributes across config turn on and off, and the current expected behavior with HDFS-11402 so as to catch any regressions in the future. was: Open files captured in the snapshots have their meta data preserved based on the config _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). It is possible for the NameNode to get restarted with config turned on or off and the attributes of the open files captured in the snapshots are influenced accordingly. Better to have a test case to verify open file attributes across config turn on and off, and the current expected behavior with HDFS-11402 so as to catch any regressions in the future. > Verify open files captured in the snapshots across config disable and enable > > > Key: HDFS-12730 > URL: https://issues.apache.org/jira/browse/HDFS-12730 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > > Open files captured in the snapshots have their meta data preserved based on > the config > _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the > upgrade scenario or when the NameNode gets restarted with config turned on or > off, the attributes of the open files captured in the snapshots are > influenced accordingly. Better to have a test case to verify open file > attributes across config turn on and off, and the current expected behavior > with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
Manoj Govindassamy created HDFS-12730: - Summary: Verify open files captured in the snapshots across config disable and enable Key: HDFS-12730 URL: https://issues.apache.org/jira/browse/HDFS-12730 Project: Hadoop HDFS Issue Type: Test Components: hdfs Affects Versions: 3.0.0 Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy Open files captured in the snapshots have their meta data preserved based on the config _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). It is possible for the NameNode to get restarted with config turned on or off and the attributes of the open files captured in the snapshots are influenced accordingly. Better to have a test case to verify open file attributes across config turn on and off, and the current expected behavior with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219923#comment-16219923 ] Manoj Govindassamy commented on HDFS-12544: --- The build failure doesn't look related to the commit. {noformat} [INFO] Apache Hadoop Cloud Storage Project FAILURE [ 3.193 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 17:45 min [INFO] Finished at: 2017-10-26T01:22:04+00:00 [INFO] Final Memory: 440M/4669M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-deploy-plugin:2.8.1:deploy (default-deploy) on project hadoop-cloud-storage-project: Failed to retrieve remote metadata org.apache.hadoop:hadoop-main:3.1.0-SNAPSHOT/maven-metadata.xml: Could not transfer metadata org.apache.hadoop:hadoop-main:3.1.0-SNAPSHOT/maven-metadata.xml from/to apache.snapshots.https (https://repository.apache.org/content/repositories/snapshots): Failed to transfer file: https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-main/3.1.0-SNAPSHOT/maven-metadata.xml. Return code is: 503 , ReasonPhrase:Service Unavailable. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-cloud-storage-project Build step 'Execute shell' marked build as failure [JIRA] Updating issue HDFS-12544 [JIRA] Updating issue HADOOP-14957 [JIRA] Updating issue HDFS-12579 [JIRA] Updating issue YARN-4827 [JIRA] Updating issue HADOOP-14840 ERROR: No tool found matching LATEST1_8_HOME Setting MAVEN_3_3_3_HOME=/home/jenkins/tools/maven/apache-maven-3.3.3 Finished: FAILURE {noformat} > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.0.0 > > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch, HDFS-12544.04.patch, HDFS-12544.05.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12544: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.0.0 > > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch, HDFS-12544.04.patch, HDFS-12544.05.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219224#comment-16219224 ] Manoj Govindassamy commented on HDFS-12544: --- Thanks for the review [~yzhangal]. Fixed the checkstyle issue and committed to trunk. Filed HADOOP-14983 to track the DistCp enhancements to support snap root descendant directories. > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch, HDFS-12544.04.patch, HDFS-12544.05.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12544: -- Attachment: HDFS-12544.05.patch Attached v05 patch with test updated to cover more cases discussed in the previous comment. [~yzhangal], can you please take a look at the latest patch? > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch, HDFS-12544.04.patch, HDFS-12544.05.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217386#comment-16217386 ] Manoj Govindassamy commented on HDFS-12544: --- [~yzhangal], 1. Yes, just like the files moved out of the scope directory are showing as "Deleted", the files moved in under a scope directory as part of renames will show as "Added". 2. The newly created directory/files are available in the current version. So, even these newly created dirs can be requested for the scope diff. Its just that they are not part of any older snapshots so we will get empty diff list. Will post a new patch revision with tests updated to cover above cases. Thanks. > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch, HDFS-12544.04.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12653) Implement toArray() and subArray() for ReadOnlyList
[ https://issues.apache.org/jira/browse/HDFS-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12653: -- Attachment: HDFS-12653.01.patch Attached v01 patch to address the following 1. Implemented {{ReadOnlyList#toArray()}} and {{ReadOnlyList#subArray()}} to return an array view of the backing list 2. TestReadOnly - unit tests to verify various contracts in ReadOnlyList. ReadOnly#toArray() and ReadOnlyList#subArray() can be made use when getting attributes from INodeAttributesProvider (HDFS-12652) and when working on the children list for a snapshot. Will follow on these after completing this jira. [~eddyxu], [~yzhangal], [~daryn], can you please take a look at the patch. > Implement toArray() and subArray() for ReadOnlyList > --- > > Key: HDFS-12653 > URL: https://issues.apache.org/jira/browse/HDFS-12653 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12653.01.patch > > > {{ReadOnlyList}} today gives an unmodifiable view of the backing List. This > list supports following Util methods for easy construction of read only views > of any given list. > {noformat} > public static ReadOnlyList asReadOnlyList(final List list) > public static List asList(final ReadOnlyList list) > {noformat} > {{asList}} above additionally overrides {{Object[] toArray()}} of the > {{java.util.List}} interface. Unlike the {{java.util.List}}, the above one > returns an array of Objects referring to the backing list and avoid any > copying of objects. Given that we have many usages of read only lists, > 1. Lets have a light-weight / shared-view {{toArray()}} implementation for > {{ReadOnlyList}} as well. > 2. Additionally, similar to {{java.util.List#subList(fromIndex, toIndex)}}, > lets have {{ReadOnlyList#subArray(fromIndex, toIndex)}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12544: -- Attachment: HDFS-12544.04.patch Attached v04 patch to address the following: 1. Handled file rename/move case for the snapshot scope directory. 2. New unit test for the file rename. 3. Added more comments in the test and snapshot manager. 4. Fixed typos pointed out by Yongjun in the previous comment. [~yzhangal], can you please take a look at the patch? > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch, HDFS-12544.04.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211876#comment-16211876 ] Manoj Govindassamy commented on HDFS-12544: --- Thanks for the review comments [~yzhangal]. Good discussion on the file rename behavior w.r.t snapshot diff for descendant directory. Thats right, the renamed files still show up in the diff report as "R" entry even though they are moved out of the scope (descendant) directory. To get the same behavior as the normal snapshot diff report, these renamed files whose target is not under the scoped directory should be shown as "D" deleted entries in the report. Will post a new patch to handle this case. > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12614) FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider configured
[ https://issues.apache.org/jira/browse/HDFS-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12614: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider > configured > -- > > Key: HDFS-12614 > URL: https://issues.apache.org/jira/browse/HDFS-12614 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.0.0 > > Attachments: HDFS-12614.01.patch, HDFS-12614.02.patch, > HDFS-12614.03.patch, HDFS-12614.04.patch, HDFS-12614.test.01.patch > > > When INodeAttributesProvider is configured, and when resolving path (like > "/") and checking for permission, the following code when working on > {{pathByNameArr}} throws NullPointerException. > {noformat} > private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx, > INode inode, int snapshotId) { > INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId); > if (getAttributesProvider() != null) { > String[] elements = new String[pathIdx + 1]; > for (int i = 0; i < elements.length; i++) { > elements[i] = DFSUtil.bytes2String(pathByNameArr[i]); <=== > } > inodeAttrs = getAttributesProvider().getAttributes(elements, > inodeAttrs); > } > return inodeAttrs; > } > {noformat} > Looks like for paths like "/" where the split components based on delimiter > "/" can be null, the pathByNameArr array can have null elements and can throw > NPE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12614) FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider configured
[ https://issues.apache.org/jira/browse/HDFS-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206840#comment-16206840 ] Manoj Govindassamy commented on HDFS-12614: --- Thanks for the review [~daryn] and [~yzhangal]. Committed to trunk. > FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider > configured > -- > > Key: HDFS-12614 > URL: https://issues.apache.org/jira/browse/HDFS-12614 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.0.0 > > Attachments: HDFS-12614.01.patch, HDFS-12614.02.patch, > HDFS-12614.03.patch, HDFS-12614.04.patch, HDFS-12614.test.01.patch > > > When INodeAttributesProvider is configured, and when resolving path (like > "/") and checking for permission, the following code when working on > {{pathByNameArr}} throws NullPointerException. > {noformat} > private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx, > INode inode, int snapshotId) { > INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId); > if (getAttributesProvider() != null) { > String[] elements = new String[pathIdx + 1]; > for (int i = 0; i < elements.length; i++) { > elements[i] = DFSUtil.bytes2String(pathByNameArr[i]); <=== > } > inodeAttrs = getAttributesProvider().getAttributes(elements, > inodeAttrs); > } > return inodeAttrs; > } > {noformat} > Looks like for paths like "/" where the split components based on delimiter > "/" can be null, the pathByNameArr array can have null elements and can throw > NPE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12653) Implement toArray() and subArray() for ReadOnlyList
[ https://issues.apache.org/jira/browse/HDFS-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204273#comment-16204273 ] Manoj Govindassamy commented on HDFS-12653: --- [~daryn], Currently ReadOnlyList is predominantly used by the Directory and Snapshot subsystems for storing their children inodes / snapshots in a _sorted_ order. I see it as a SortedList and many a times the users of this list make use of the sorted nature of the elements for searching - {{ReadOnlyList#Util#binarySearch(ReadOnlyList, K key)}}. On top of this sorted benefits, {{ReadOnlyList#Util#asList()}} gives a {{List}} where {{toArray()}} differs significantly from the Collections toArray -- the returned array is more of a _view_ of the backing read only list, without copying any elements. I believe we can make use of ReadOnlyList for enhancing the performance of {{INodeAttributesProvider#getAttributes()}} by converting byte[][] bPathComponents to ReadOnlyList sPathComponents only one time and getting the _view_ of the string path components using toArray() or subArray(start, end). Collections doesn't have subArray() concept, theres only subList(). > Implement toArray() and subArray() for ReadOnlyList > --- > > Key: HDFS-12653 > URL: https://issues.apache.org/jira/browse/HDFS-12653 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > > {{ReadOnlyList}} today gives an unmodifiable view of the backing List. This > list supports following Util methods for easy construction of read only views > of any given list. > {noformat} > public static ReadOnlyList asReadOnlyList(final List list) > public static List asList(final ReadOnlyList list) > {noformat} > {{asList}} above additionally overrides {{Object[] toArray()}} of the > {{java.util.List}} interface. Unlike the {{java.util.List}}, the above one > returns an array of Objects referring to the backing list and avoid any > copying of objects. Given that we have many usages of read only lists, > 1. Lets have a light-weight / shared-view {{toArray()}} implementation for > {{ReadOnlyList}} as well. > 2. Additionally, similar to {{java.util.List#subList(fromIndex, toIndex)}}, > lets have {{ReadOnlyList#subArray(fromIndex, toIndex)}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12614) FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider configured
[ https://issues.apache.org/jira/browse/HDFS-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12614: -- Attachment: HDFS-12614.04.patch Thanks for the review [~daryn]. Thats right, string literals and constant string expressions are already interned. Attached 04 patch, removing the explicit string intern. Please take a look at the latest revision. > FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider > configured > -- > > Key: HDFS-12614 > URL: https://issues.apache.org/jira/browse/HDFS-12614 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12614.01.patch, HDFS-12614.02.patch, > HDFS-12614.03.patch, HDFS-12614.04.patch, HDFS-12614.test.01.patch > > > When INodeAttributesProvider is configured, and when resolving path (like > "/") and checking for permission, the following code when working on > {{pathByNameArr}} throws NullPointerException. > {noformat} > private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx, > INode inode, int snapshotId) { > INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId); > if (getAttributesProvider() != null) { > String[] elements = new String[pathIdx + 1]; > for (int i = 0; i < elements.length; i++) { > elements[i] = DFSUtil.bytes2String(pathByNameArr[i]); <=== > } > inodeAttrs = getAttributesProvider().getAttributes(elements, > inodeAttrs); > } > return inodeAttrs; > } > {noformat} > Looks like for paths like "/" where the split components based on delimiter > "/" can be null, the pathByNameArr array can have null elements and can throw > NPE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12653) Implement toArray() and toSubArray() for ReadOnlyList
Manoj Govindassamy created HDFS-12653: - Summary: Implement toArray() and toSubArray() for ReadOnlyList Key: HDFS-12653 URL: https://issues.apache.org/jira/browse/HDFS-12653 Project: Hadoop HDFS Issue Type: Improvement Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy {{ReadOnlyList}} today gives an unmodifiable view of the backing List. This list supports following Util methods for easy construction of read only views of any given list. {noformat} public static ReadOnlyList asReadOnlyList(final List list) public static List asList(final ReadOnlyList list) {noformat} {{asList}} above additionally overrides {{Object[] toArray()}} of the {{java.util.List}} interface. Unlike the {{java.util.List}}, the above one returns an array of Objects referring to the backing list and avoid any copying of objects. Given that we have many usages of read only lists, 1. Lets have a light-weight / shared-view {{toArray()}} implementation for {{ReadOnlyList}} as well. 2. Additionally, similar to {{java.util.List#subList(fromIndex, toIndex)}}, lets have {{ReadOnlyList#subArray(fromIndex, toIndex)}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12653) Implement toArray() and subArray() for ReadOnlyList
[ https://issues.apache.org/jira/browse/HDFS-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12653: -- Summary: Implement toArray() and subArray() for ReadOnlyList (was: Implement toArray() and toSubArray() for ReadOnlyList) > Implement toArray() and subArray() for ReadOnlyList > --- > > Key: HDFS-12653 > URL: https://issues.apache.org/jira/browse/HDFS-12653 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > > {{ReadOnlyList}} today gives an unmodifiable view of the backing List. This > list supports following Util methods for easy construction of read only views > of any given list. > {noformat} > public static ReadOnlyList asReadOnlyList(final List list) > public static List asList(final ReadOnlyList list) > {noformat} > {{asList}} above additionally overrides {{Object[] toArray()}} of the > {{java.util.List}} interface. Unlike the {{java.util.List}}, the above one > returns an array of Objects referring to the backing list and avoid any > copying of objects. Given that we have many usages of read only lists, > 1. Lets have a light-weight / shared-view {{toArray()}} implementation for > {{ReadOnlyList}} as well. > 2. Additionally, similar to {{java.util.List#subList(fromIndex, toIndex)}}, > lets have {{ReadOnlyList#subArray(fromIndex, toIndex)}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12614) FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider configured
[ https://issues.apache.org/jira/browse/HDFS-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202794#comment-16202794 ] Manoj Govindassamy commented on HDFS-12614: --- Filed HDFS-12652 to track {{INodeAttributeProvider#getAttributes()}} performance improvement task detailed by [~daryn] in the previous comments. I am assuming that the request is not for changing the INodeAttributesProvider#getAttributes() interface. > FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider > configured > -- > > Key: HDFS-12614 > URL: https://issues.apache.org/jira/browse/HDFS-12614 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12614.01.patch, HDFS-12614.02.patch, > HDFS-12614.03.patch, HDFS-12614.test.01.patch > > > When INodeAttributesProvider is configured, and when resolving path (like > "/") and checking for permission, the following code when working on > {{pathByNameArr}} throws NullPointerException. > {noformat} > private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx, > INode inode, int snapshotId) { > INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId); > if (getAttributesProvider() != null) { > String[] elements = new String[pathIdx + 1]; > for (int i = 0; i < elements.length; i++) { > elements[i] = DFSUtil.bytes2String(pathByNameArr[i]); <=== > } > inodeAttrs = getAttributesProvider().getAttributes(elements, > inodeAttrs); > } > return inodeAttrs; > } > {noformat} > Looks like for paths like "/" where the split components based on delimiter > "/" can be null, the pathByNameArr array can have null elements and can throw > NPE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12652) INodeAttributesProvider#getAttributes(): Avoid multiple conversions of path components byte[][] to String[] when requesting INode attributes
Manoj Govindassamy created HDFS-12652: - Summary: INodeAttributesProvider#getAttributes(): Avoid multiple conversions of path components byte[][] to String[] when requesting INode attributes Key: HDFS-12652 URL: https://issues.apache.org/jira/browse/HDFS-12652 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.0.0-beta1 Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy {{INodeAttributesProvider#getAttributes}} needs the path components passed in to be an array of Strings. Where as the INode and related layers maintain path components as an array of byte[]. So, these layers are required to convert each byte[] component of the path back into a string and for multiple times when requesting for INode attributes from the Provider. That is, the path "/a/b/c" requires calling the attribute provider with: (1) "", (2) "", "a", (3) "", "a","b", (4) "", "a","b", "c". Every single one of those strings were freshly (re)converted from a byte[]. Say, a file listing is done on a huge directory containing 100s of millions of files, then these multiple time redundant conversions of byte[][] to String[] create lots of tiny object garbages, occupying memory and affecting performance. Better if we could avoid creating redundant copies of path component strings. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12614) FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider configured
[ https://issues.apache.org/jira/browse/HDFS-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12614: -- Attachment: HDFS-12614.03.patch Attached v03 patch with more comments. [~yzhangal], [~daryn], can you please take a look at the latest patch revision? Thanks. > FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider > configured > -- > > Key: HDFS-12614 > URL: https://issues.apache.org/jira/browse/HDFS-12614 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12614.01.patch, HDFS-12614.02.patch, > HDFS-12614.03.patch, HDFS-12614.test.01.patch > > > When INodeAttributesProvider is configured, and when resolving path (like > "/") and checking for permission, the following code when working on > {{pathByNameArr}} throws NullPointerException. > {noformat} > private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx, > INode inode, int snapshotId) { > INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId); > if (getAttributesProvider() != null) { > String[] elements = new String[pathIdx + 1]; > for (int i = 0; i < elements.length; i++) { > elements[i] = DFSUtil.bytes2String(pathByNameArr[i]); <=== > } > inodeAttrs = getAttributesProvider().getAttributes(elements, > inodeAttrs); > } > return inodeAttrs; > } > {noformat} > Looks like for paths like "/" where the split components based on delimiter > "/" can be null, the pathByNameArr array can have null elements and can throw > NPE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12614) FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider configured
[ https://issues.apache.org/jira/browse/HDFS-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12614: -- Attachment: HDFS-12614.02.patch Thanks for the review [~daryn]. I had the same dilemma on whether to change the semantics for the root path component. I didn't see any functionalities failing because of this change though. But, I do concur that semantic change was riskier. Attached v02 patch to workaround the issue in {{FSPermissionChecker#getINodeAttrs()}} for the null root path component. Please take a look. I will track the other enhancement you talked about in a new jira. > FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider > configured > -- > > Key: HDFS-12614 > URL: https://issues.apache.org/jira/browse/HDFS-12614 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12614.01.patch, HDFS-12614.02.patch, > HDFS-12614.test.01.patch > > > When INodeAttributesProvider is configured, and when resolving path (like > "/") and checking for permission, the following code when working on > {{pathByNameArr}} throws NullPointerException. > {noformat} > private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx, > INode inode, int snapshotId) { > INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId); > if (getAttributesProvider() != null) { > String[] elements = new String[pathIdx + 1]; > for (int i = 0; i < elements.length; i++) { > elements[i] = DFSUtil.bytes2String(pathByNameArr[i]); <=== > } > inodeAttrs = getAttributesProvider().getAttributes(elements, > inodeAttrs); > } > return inodeAttrs; > } > {noformat} > Looks like for paths like "/" where the split components based on delimiter > "/" can be null, the pathByNameArr array can have null elements and can throw > NPE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory
[ https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12544: -- Attachment: HDFS-12544.03.patch Thanks for the review [~yzhangal]. Attached v03 patch to address the following comments. Can you please review the latest patch? bq. It seems to make sense to include a new field snapshotDiffScopeDir in the SnapshotDiffInfo class, and initialize it as the constructor. Done. bq. suggest to move the checking from SnapshotManager%getSnapshottableAncestorDir to its caller, .. Done. bq. suggest to remove the method SnapshotManager%setSnapshotDiffAllowSnapRootDescendant, and use the config property to pass on the value to the cluster.. Done. bq. Nit. In SnapshotManager.java, change "directories" to "directory" in the following text... Done. > SnapshotDiff - support diff generation on any snapshot root descendant > directory > > > Key: HDFS-12544 > URL: https://issues.apache.org/jira/browse/HDFS-12544 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, > HDFS-12544.03.patch > > > {noformat} > # hdfs snapshotDiff > > {noformat} > Using snapshot diff command, we can generate a diff report between any two > given snapshots under a snapshot root directory. The command today only > accepts the path that is a snapshot root. There are many deployments where > the snapshot root is configured at the higher level directory but the diff > report needed is only for a specific directory under the snapshot root. In > these cases, the diff report can be filtered for changes pertaining to the > directory we are interested in. But when the snapshot root directory is very > huge, the snapshot diff report generation can take minutes even if we are > interested to know the changes only in a small directory. So, it would be > highly performant if the diff report calculation can be limited to only the > interesting sub-directory of the snapshot root instead of the whole snapshot > root. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12629) NameNode UI should report total blocks count by type - replicated and erasure coded
[ https://issues.apache.org/jira/browse/HDFS-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12629: -- Attachment: NN_UI_Summary_BlockCount_BeforeFix.png > NameNode UI should report total blocks count by type - replicated and erasure > coded > --- > > Key: HDFS-12629 > URL: https://issues.apache.org/jira/browse/HDFS-12629 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: NN_UI_Summary_BlockCount_BeforeFix.png > > > Currently NameNode UI displays total files and directories and total blocks > in the cluster under the Summary tab. But, the total blocks count split by > type is missing. It would be good if we can display total blocks counts by > type (provided by HDFS-12573) along with the total block count. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12629) NameNode UI should report total blocks count by type - replicated and erasure coded
[ https://issues.apache.org/jira/browse/HDFS-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12629: -- Description: Currently NameNode UI displays total files and directories and total blocks in the cluster under the Summary tab. But, the total blocks count split by type is missing. It would be good if we can display total blocks counts by type (provided by HDFS-12573) along with the total block count. was:Currently NameNode UI displays total files and directories and total blocks in the cluster under the Summary tab. But, the total blocks count split by type is missing. It would be good if we can have these total blocks counts also displayed along with the total block count. > NameNode UI should report total blocks count by type - replicated and erasure > coded > --- > > Key: HDFS-12629 > URL: https://issues.apache.org/jira/browse/HDFS-12629 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: NN_UI_Summary_BlockCount_BeforeFix.png > > > Currently NameNode UI displays total files and directories and total blocks > in the cluster under the Summary tab. But, the total blocks count split by > type is missing. It would be good if we can display total blocks counts by > type (provided by HDFS-12573) along with the total block count. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12629) NameNode UI should report total blocks count by type - replicated and erasure coded
Manoj Govindassamy created HDFS-12629: - Summary: NameNode UI should report total blocks count by type - replicated and erasure coded Key: HDFS-12629 URL: https://issues.apache.org/jira/browse/HDFS-12629 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.0.0-beta1 Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy Currently NameNode UI displays total files and directories and total blocks in the cluster under the Summary tab. But, the total blocks count split by type is missing. It would be good if we can have these total blocks counts also displayed along with the total block count. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12573) Divide the total block metrics into replica and ec
[ https://issues.apache.org/jira/browse/HDFS-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12573: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks for the patch contribution [~tasanuma0829]. Committed to trunk. > Divide the total block metrics into replica and ec > -- > > Key: HDFS-12573 > URL: https://issues.apache.org/jira/browse/HDFS-12573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, metrics, namenode >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma > Fix For: 3.0.0 > > Attachments: HDFS-12573.1.patch, HDFS-12573.2.patch, > HDFS-12573.3.patch > > > Following HDFS-10999, let's separate total blocks metrics. It would be useful > for administrators. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12614) FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider configured
[ https://issues.apache.org/jira/browse/HDFS-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12614: -- Affects Version/s: 3.0.0-beta1 Target Version/s: 3.0.0 Status: Patch Available (was: Open) > FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider > configured > -- > > Key: HDFS-12614 > URL: https://issues.apache.org/jira/browse/HDFS-12614 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12614.01.patch, HDFS-12614.test.01.patch > > > When INodeAttributesProvider is configured, and when resolving path (like > "/") and checking for permission, the following code when working on > {{pathByNameArr}} throws NullPointerException. > {noformat} > private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx, > INode inode, int snapshotId) { > INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId); > if (getAttributesProvider() != null) { > String[] elements = new String[pathIdx + 1]; > for (int i = 0; i < elements.length; i++) { > elements[i] = DFSUtil.bytes2String(pathByNameArr[i]); <=== > } > inodeAttrs = getAttributesProvider().getAttributes(elements, > inodeAttrs); > } > return inodeAttrs; > } > {noformat} > Looks like for paths like "/" where the split components based on delimiter > "/" can be null, the pathByNameArr array can have null elements and can throw > NPE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12614) FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider configured
[ https://issues.apache.org/jira/browse/HDFS-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12614: -- Attachment: HDFS-12614.01.patch Attached v01 patch to address the issue in {{FSDirecotry#resolvePath()}} when {{INodeAttributesProvider}} is enabled. [~eddyxu], [~kihwal], [~daryn] can you please take a look? > FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider > configured > -- > > Key: HDFS-12614 > URL: https://issues.apache.org/jira/browse/HDFS-12614 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12614.01.patch, HDFS-12614.test.01.patch > > > When INodeAttributesProvider is configured, and when resolving path (like > "/") and checking for permission, the following code when working on > {{pathByNameArr}} throws NullPointerException. > {noformat} > private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx, > INode inode, int snapshotId) { > INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId); > if (getAttributesProvider() != null) { > String[] elements = new String[pathIdx + 1]; > for (int i = 0; i < elements.length; i++) { > elements[i] = DFSUtil.bytes2String(pathByNameArr[i]); <=== > } > inodeAttrs = getAttributesProvider().getAttributes(elements, > inodeAttrs); > } > return inodeAttrs; > } > {noformat} > Looks like for paths like "/" where the split components based on delimiter > "/" can be null, the pathByNameArr array can have null elements and can throw > NPE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12614) FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider configured
[ https://issues.apache.org/jira/browse/HDFS-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12614: -- Attachment: HDFS-12614.test.01.patch Attached a test case to show the problem with INodeAttributesProvider and path resolving. > FSPermissionChecker#getINodeAttrs() throws NPE when INodeAttributesProvider > configured > -- > > Key: HDFS-12614 > URL: https://issues.apache.org/jira/browse/HDFS-12614 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-12614.test.01.patch > > > When INodeAttributesProvider is configured, and when resolving path (like > "/") and checking for permission, the following code when working on > {{pathByNameArr}} throws NullPointerException. > {noformat} > private INodeAttributes getINodeAttrs(byte[][] pathByNameArr, int pathIdx, > INode inode, int snapshotId) { > INodeAttributes inodeAttrs = inode.getSnapshotINode(snapshotId); > if (getAttributesProvider() != null) { > String[] elements = new String[pathIdx + 1]; > for (int i = 0; i < elements.length; i++) { > elements[i] = DFSUtil.bytes2String(pathByNameArr[i]); <=== > } > inodeAttrs = getAttributesProvider().getAttributes(elements, > inodeAttrs); > } > return inodeAttrs; > } > {noformat} > Looks like for paths like "/" where the split components based on delimiter > "/" can be null, the pathByNameArr array can have null elements and can throw > NPE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org