[jira] [Commented] (HDFS-9530) huge Non-DFS Used in hadoop 2.6.2 & 2.7.1

Arpit Agarwal (JIRA) Wed, 08 Jun 2016 15:56:01 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321609#comment-15321609
 ]


Arpit Agarwal commented on HDFS-9530:
-------------------------------------

I took a deeper look at the reservation code. It was painful to see my own lack 
of thoroughness.

I mostly agree with Brahma's analysis. Since reservation is done only for 
replicas created via BlockReceiver, there are a couple of potential culprits 
where the reservation could be leaked:
* Failure in {{DataXceiver#writeBlock}} after creating the BlockReceiver.
* BlockReceiver receives an unchecked exception after reserving.

Also agree with [~vinayrpet] that releasing via invalidate is the safer option 
although it can lead to the reserved space hanging around longer. 

Do we agree on the following summary of the contract for when space should be 
reserved and released?
# Space is reserved only when the on-disk block file is successfully created 
for an rbw/temporary replica. This is verifiably true in 
FsDatasetImpl#createTemporary and FsDatasetImpl#createRbw barring OOM when the 
ReplicaInPipeline/ReplicaBeingWritten is allocated.
# Space continues to be reserved as long as there is an rbw/temporary in the 
volumeMap.
# Space must be released either when the replica is finalized or it is 
invalidated. FsDatasetImpl#finalizeReplica handles the finalize case. Fixing 
invalidate would close the remaining gap.
## Space may be released earlier if a failure is detected earlier e.g. 
exception in BlockReceiver which we handle today.
## Space may also be released incrementally when some bytes are written to disk 
which is handled via ReplicaInPipeline#setBytesAcked.

Thanks again for the detailed analysis on this one Brahma, [~raviprak] and 
Vinay. Nice work.

> huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
> -----------------------------------------
>
>                 Key: HDFS-9530
>                 URL: https://issues.apache.org/jira/browse/HDFS-9530
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Fei Hui
>         Attachments: HDFS-9530-01.patch
>
>
> i think there are bugs in HDFS
> ===============================================================================
> here is config
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>
>         
> file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2
>     </value>
>   </property>
> here is dfsadmin report 
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 238604832768 (222.22 GB)
> DFS Remaining: 215772954624 (200.95 GB)
> DFS Used: 22831878144 (21.26 GB)
> DFS Used%: 9.57%
> Under replicated blocks: 4
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -------------------------------------------------
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7190958080 (6.70 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72343986176 (67.38 GB)
> DFS Used%: 8.96%
> DFS Remaining%: 90.14%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:02 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7219073024 (6.72 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72315871232 (67.35 GB)
> DFS Used%: 9.00%
> DFS Remaining%: 90.11%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 8421847040 (7.84 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 71113097216 (66.23 GB)
> DFS Used%: 10.49%
> DFS Remaining%: 88.61%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> ================================================================================
> when running hive job , dfsadmin report as follows
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 108266011136 (100.83 GB)
> DFS Remaining: 80078416384 (74.58 GB)
> DFS Used: 28187594752 (26.25 GB)
> DFS Used%: 26.04%
> Under replicated blocks: 7
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -------------------------------------------------
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9015627776 (8.40 GB)
> Non DFS Used: 44303742464 (41.26 GB)
> DFS Remaining: 26937047552 (25.09 GB)
> DFS Used%: 11.23%
> DFS Remaining%: 33.56%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 693
> Last contact: Wed Dec 09 15:37:35 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9163116544 (8.53 GB)
> Non DFS Used: 47895897600 (44.61 GB)
> DFS Remaining: 23197403648 (21.60 GB)
> DFS Used%: 11.42%
> DFS Remaining%: 28.90%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 750
> Last contact: Wed Dec 09 15:37:36 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 10008850432 (9.32 GB)
> Non DFS Used: 40303602176 (37.54 GB)
> DFS Remaining: 29943965184 (27.89 GB)
> DFS Used%: 12.47%
> DFS Remaining%: 37.31%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 632
> Last contact: Wed Dec 09 15:37:36 CST 2015
> =========================================================================
> but, df output is as follows on worker-1
> [hadoop@worker-1 ~]$ df
> Filesystem     1K-blocks    Used Available Use% Mounted on
> /dev/xvda1      20641404 4229676  15363204  22% /
> tmpfs            8165456       0   8165456   0% /dev/shm
> /dev/xvdc       20642428 2596920  16996932  14% /mnt/disk3
> /dev/xvdb       20642428 2692228  16901624  14% /mnt/disk4
> /dev/xvdd       20642428 2445852  17148000  13% /mnt/disk2
> /dev/xvde       20642428 2909764  16684088  15% /mnt/disk1
> df output conflitcs with dfsadmin report
> any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9530) huge Non-DFS Used in hadoop 2.6.2 & 2.7.1

Reply via email to