subject:"\[jira\] \[Updated\] \(HBASE\-6626\) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide."

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

2014-08-07 Thread Jonathan Hsieh (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hsieh updated HBASE-6626:
--

Resolution: Fixed
Fix Version/s: 2.0.0
0.99.0
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)

Add a chapter on HDFS in the troubleshooting section of the HBase reference
guide.
--

Key: HBASE-6626
URL: https://issues.apache.org/jira/browse/HBASE-6626
Project: HBase
Issue Type: Improvement
Components: documentation
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Misty Stanley-Jones
Priority: Blocker
Fix For: 0.99.0, 2.0.0

Attachments: HBASE-6626.patch, troubleshooting.txt

I looked mainly at the major failure case, but here is what I have:
New sub chapter in the existing chapter Troubleshooting and Debugging
HBase: HDFS HBASE
1) HDFS HBase
2) Connection related settings
2.1) Number of retries
2.2) Timeouts
3) Log samples
1) HDFS HBase
HBase uses HDFS to store its HFile, i.e. the core HBase files and the
Write-Ahead-Logs, i.e. the files that will be used to restore the data after
a crash.
In both cases, the reliability of HBase comes from the fact that HDFS writes
the data to multiple locations. To be efficient, HBase needs the data to be
available locally, hence it's highly recommended to have the HDFS datanode on
the same machines as the HBase Region Servers.
Detailed information on how HDFS works can be found at [1].
Important features are:
- HBase is a client application of HDFS, i.e. uses the HDFS DFSClient class.
This class can appears in HBase logs with other HDFS client related logs.
- Some HDFS settings are HDFS-server-side, i.e. must be set on the HDFS
side, while some other are HDFS-client-side, i.e. must be set in HBase, while
some other must be set in both places.
- the HDFS writes are pipelined from one datanode to another. When writing,
there are communications between:
- HBase and HDFS namenode, through the HDFS client classes.
- HBase and HDFS datanodes, through the HDFS client classes.
- HDFS datanode between themselves: issues on these communications are in
HDFS logs, not HBase. HDFS writes are always local when possible. As a
consequence, there should not be much write error in HBase Region Servers:
they write to the local datanode. If this datanode can't replicate the
blocks, it will appear in its logs, not in the region servers logs.
- datanodes can be contacted through the ipc.Client interface (once again
this class can shows up in HBase logs) and the data transfer interface
(usually shows up as the DataNode class in the HBase logs). There are on
different ports (defaults being: 50010 and 50020).
- To understand exactly what's going on, you must look that the HDFS log
files as well: HBase logs represent the client side.
- With the default setting, HDFS needs 630s to mark a datanode as dead. For
this reason, this node will still be tried by HBase or by other datanodes
when writing and reading until HDFS definitively decides it's dead. This will
add some extras lines in the logs. This monitoring is performed by the
NameNode.
- The HDFS clients (i.e. HBase using HDFS client code) don't fully rely on
the NameNode, but can mark temporally a node as dead if they had an error
when they tried to use it.
2) Settings for retries and timeouts
2.1) Retries
ipc.client.connect.max.retries
Default 10
Indicates the number of retries a client will make to establish a server
connection. Not taken into account if the error is a SocketTimeout. In this
case the number of retries is 45 (fixed on branch, HADOOP-7932 or in
HADOOP-7397). For SASL, the number of retries is hard-coded to 15. Can be
increased, especially if the socket timeouts have been lowered.
ipc.client.connect.max.retries.on.timeouts
Default 45
If you have HADOOP-7932, max number of retries on timeout. Counter is
different than ipc.client.connect.max.retries so if you mix the socket errors
you will get 55 retries with the default values. Could be lowered, once it is
available. With HADOOP-7397 ipc.client.connect.max.retries is reused so there
would be 10 tries.
dfs.client.block.write.retries
Default 3
Number of tries for the client when writing a block. After a failure, will
connect to the namenode a get a new location, sending the list of the
datanodes already tried without success. Could be increased, especially if
the socket timeouts have been lowered. See HBASE-6490.
dfs.client.block.write.locateFollowingBlock.retries
Default 5
Number of retries to the namenode when the client got
NotReplicatedYetException, i.e. the existing

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

2014-07-09 Thread Misty Stanley-Jones (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Misty Stanley-Jones updated HBASE-6626:
---

Attachment: HBASE-6626.patch

I made an attempt. I did not integrate the info in the comments but I did check
the initial content and updated the Hadoop parameters and defaults where
needed. I left a couple of the parameters out because they didn't seem to exist
anymore or were marked as 'expert' in the HDFS config docs. I would consider
'expert' parameters for HDFS to be out of scope and possibly dangerous for
HBase to recommend tweaking. WDYT?

Add a chapter on HDFS in the troubleshooting section of the HBase reference
guide.
--

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

2014-07-09 Thread Misty Stanley-Jones (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Misty Stanley-Jones updated HBASE-6626:
---

Status: Patch Available (was: Open)

Add a chapter on HDFS in the troubleshooting section of the HBase reference
guide.
--

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

2014-07-09 Thread Misty Stanley-Jones (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Misty Stanley-Jones updated HBASE-6626:
---

Attachment: (was: HBASE-6626.patch)

Add a chapter on HDFS in the troubleshooting section of the HBase reference
guide.
--

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

2014-07-09 Thread Misty Stanley-Jones (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Misty Stanley-Jones updated HBASE-6626:
---

Attachment: HBASE-6626.patch

Re-generated the patch. I can apply it to the current master so I'm not sure
what is wrong.

Add a chapter on HDFS in the troubleshooting section of the HBase reference
guide.
--

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

2012-08-21 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-6626:
-

Attachment: troubleshooting.txt

Started converting to docbook.

Nicolas, you are missing the [1] link below. What did you intend to point to?

Add a chapter on HDFS in the troubleshooting section of the HBase reference
guide.
--

Key: HBASE-6626
URL: https://issues.apache.org/jira/browse/HBASE-6626
Project: HBase
Issue Type: Improvement
Components: documentation
Affects Versions: 0.96.0
Reporter: nkeywal
Priority: Minor
Attachments: troubleshooting.txt

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.

6 matches

Site Navigation

Mail list logo

Footer information