[ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321609#comment-15321609 ]
Arpit Agarwal commented on HDFS-9530: ------------------------------------- I took a deeper look at the reservation code. It was painful to see my own lack of thoroughness. I mostly agree with Brahma's analysis. Since reservation is done only for replicas created via BlockReceiver, there are a couple of potential culprits where the reservation could be leaked: * Failure in {{DataXceiver#writeBlock}} after creating the BlockReceiver. * BlockReceiver receives an unchecked exception after reserving. Also agree with [~vinayrpet] that releasing via invalidate is the safer option although it can lead to the reserved space hanging around longer. Do we agree on the following summary of the contract for when space should be reserved and released? # Space is reserved only when the on-disk block file is successfully created for an rbw/temporary replica. This is verifiably true in FsDatasetImpl#createTemporary and FsDatasetImpl#createRbw barring OOM when the ReplicaInPipeline/ReplicaBeingWritten is allocated. # Space continues to be reserved as long as there is an rbw/temporary in the volumeMap. # Space must be released either when the replica is finalized or it is invalidated. FsDatasetImpl#finalizeReplica handles the finalize case. Fixing invalidate would close the remaining gap. ## Space may be released earlier if a failure is detected earlier e.g. exception in BlockReceiver which we handle today. ## Space may also be released incrementally when some bytes are written to disk which is handled via ReplicaInPipeline#setBytesAcked. Thanks again for the detailed analysis on this one Brahma, [~raviprak] and Vinay. Nice work. > huge Non-DFS Used in hadoop 2.6.2 & 2.7.1 > ----------------------------------------- > > Key: HDFS-9530 > URL: https://issues.apache.org/jira/browse/HDFS-9530 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.1 > Reporter: Fei Hui > Attachments: HDFS-9530-01.patch > > > i think there are bugs in HDFS > =============================================================================== > here is config > <property> > <name>dfs.datanode.data.dir</name> > <value> > > file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2 > </value> > </property> > here is dfsadmin report > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 238604832768 (222.22 GB) > DFS Remaining: 215772954624 (200.95 GB) > DFS Used: 22831878144 (21.26 GB) > DFS Used%: 9.57% > Under replicated blocks: 4 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > ------------------------------------------------- > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7190958080 (6.70 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72343986176 (67.38 GB) > DFS Used%: 8.96% > DFS Remaining%: 90.14% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:02 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7219073024 (6.72 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72315871232 (67.35 GB) > DFS Used%: 9.00% > DFS Remaining%: 90.11% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 8421847040 (7.84 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 71113097216 (66.23 GB) > DFS Used%: 10.49% > DFS Remaining%: 88.61% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > ================================================================================ > when running hive job , dfsadmin report as follows > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 108266011136 (100.83 GB) > DFS Remaining: 80078416384 (74.58 GB) > DFS Used: 28187594752 (26.25 GB) > DFS Used%: 26.04% > Under replicated blocks: 7 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > ------------------------------------------------- > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9015627776 (8.40 GB) > Non DFS Used: 44303742464 (41.26 GB) > DFS Remaining: 26937047552 (25.09 GB) > DFS Used%: 11.23% > DFS Remaining%: 33.56% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 693 > Last contact: Wed Dec 09 15:37:35 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9163116544 (8.53 GB) > Non DFS Used: 47895897600 (44.61 GB) > DFS Remaining: 23197403648 (21.60 GB) > DFS Used%: 11.42% > DFS Remaining%: 28.90% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 750 > Last contact: Wed Dec 09 15:37:36 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 10008850432 (9.32 GB) > Non DFS Used: 40303602176 (37.54 GB) > DFS Remaining: 29943965184 (27.89 GB) > DFS Used%: 12.47% > DFS Remaining%: 37.31% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 632 > Last contact: Wed Dec 09 15:37:36 CST 2015 > ========================================================================= > but, df output is as follows on worker-1 > [hadoop@worker-1 ~]$ df > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/xvda1 20641404 4229676 15363204 22% / > tmpfs 8165456 0 8165456 0% /dev/shm > /dev/xvdc 20642428 2596920 16996932 14% /mnt/disk3 > /dev/xvdb 20642428 2692228 16901624 14% /mnt/disk4 > /dev/xvdd 20642428 2445852 17148000 13% /mnt/disk2 > /dev/xvde 20642428 2909764 16684088 15% /mnt/disk1 > df output conflitcs with dfsadmin report > any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org