date:20091223


 [ 
https://issues.apache.org/jira/browse/HDFS-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-849:


Hadoop Flags: [Reviewed]

+1 patch looks good.

 .TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails
 ---

 Key: HDFS-849
 URL: https://issues.apache.org/jira/browse/HDFS-849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.20.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.21.0, 0.22.0

 Attachments: countDown.patch


 .TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails with the 
 following error:
 junit.framework.AssertionFailedError: 
   at 
 org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.runTest17_19(TestFiDataTransferProtocol2.java:139)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.pipeline_Fi_18(TestFiDataTransferProtocol2.java:186)
 Which means that the test did not trigger pipeline recovery. The test log 
 shows that there is no fault injected to the pipeline. It turns out there is 
 a bug in the test code. Counting down 3 means inject a fault when receiving 
 the fourth packet. But the code allows the file to have only 3 packets. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.


 [ 
https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-814:


Attachment: h814_20091221_0.21.patch

h814_20091221_0.21.patch: for 0.21

 Add an api to get the visible length of a DFSDataInputStream.
 -

 Key: HDFS-814
 URL: https://issues.apache.org/jira/browse/HDFS-814
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.21.0, 0.22.0

 Attachments: h814_20091221.patch, h814_20091221_0.21.patch


 Hflush guarantees that the bytes written before are visible to the new 
 readers.  However, there is no way to get the length of the visible bytes.  
 The visible length is useful in some applications like SequenceFile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-116) NPE if the system can't determine its own name and you go DNS.getDefaultHost(null)


[ 
https://issues.apache.org/jira/browse/HDFS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794153#action_12794153
 ] 

Steve Loughran commented on HDFS-116:
-

May be better to have this fail with a meaningful error we don't know our own 
hostname, please fix your underlying system configuration

 NPE if the system can't determine its own name and you go 
 DNS.getDefaultHost(null)
 --

 Key: HDFS-116
 URL: https://issues.apache.org/jira/browse/HDFS-116
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor

 In a test case that I am newly writing, on my infamous home machine with 
 broken DNS, I cant call getByName(null) without seeing a stack trace:
 Testcase: testNullInterface took 0.014 sec
   Caused an ERROR
 null
 java.lang.NullPointerException
   at java.net.NetworkInterface.getByName(NetworkInterface.java:226)
   at org.apache.hadoop.net.DNS.getIPs(DNS.java:94)
   at org.apache.hadoop.net.DNS.getHosts(DNS.java:141)
   at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:218)
   at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:235)
   at org.apache.hadoop.net.TestDNS.testNullInterface(TestDNS.java:62)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.


[ 
https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794166#action_12794166
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-814:
-

I forgot to say that the failure of TestFiDataTransferProtocol2 is not related 
to this.  See HDFS-849.

 Add an api to get the visible length of a DFSDataInputStream.
 -

 Key: HDFS-814
 URL: https://issues.apache.org/jira/browse/HDFS-814
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.21.0, 0.22.0

 Attachments: h814_20091221.patch, h814_20091221_0.21.patch


 Hflush guarantees that the bytes written before are visible to the new 
 readers.  However, there is no way to get the length of the visible bytes.  
 The visible length is useful in some applications like SequenceFile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-849) .TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-849:
---

Status: Open  (was: Patch Available)

 .TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails
 ---

 Key: HDFS-849
 URL: https://issues.apache.org/jira/browse/HDFS-849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.20.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.21.0, 0.22.0

 Attachments: countDown.patch


 .TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails with the 
 following error:
 junit.framework.AssertionFailedError: 
   at 
 org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.runTest17_19(TestFiDataTransferProtocol2.java:139)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.pipeline_Fi_18(TestFiDataTransferProtocol2.java:186)
 Which means that the test did not trigger pipeline recovery. The test log 
 shows that there is no fault injected to the pipeline. It turns out there is 
 a bug in the test code. Counting down 3 means inject a fault when receiving 
 the fourth packet. But the code allows the file to have only 3 packets. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-849) TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-849:
---

Summary: TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails  (was: 
.TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails)

 TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails
 --

 Key: HDFS-849
 URL: https://issues.apache.org/jira/browse/HDFS-849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.20.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.21.0, 0.22.0

 Attachments: countDown.patch


 .TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails with the 
 following error:
 junit.framework.AssertionFailedError: 
   at 
 org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.runTest17_19(TestFiDataTransferProtocol2.java:139)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.pipeline_Fi_18(TestFiDataTransferProtocol2.java:186)
 Which means that the test did not trigger pipeline recovery. The test log 
 shows that there is no fault injected to the pipeline. It turns out there is 
 a bug in the test code. Counting down 3 means inject a fault when receiving 
 the fourth packet. But the code allows the file to have only 3 packets. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-849) TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails

[
https://issues.apache.org/jira/browse/HDFS-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794178#action_12794178
]

Hairong Kuang commented on HDFS-849:

Hudson seemed to have some problem testing this patch so canceling it. This
patch effects only fault injection tests. I have ran fault tests multiple times
and all passed.

TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails
--

Key: HDFS-849
URL: https://issues.apache.org/jira/browse/HDFS-849
Project: Hadoop HDFS
Issue Type: Bug
Components: test
Affects Versions: 0.20.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.21.0, 0.22.0

Attachments: countDown.patch

.TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails with the
following error:
junit.framework.AssertionFailedError:
at
org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.runTest17_19(TestFiDataTransferProtocol2.java:139)
at
org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.pipeline_Fi_18(TestFiDataTransferProtocol2.java:186)
Which means that the test did not trigger pipeline recovery. The test log
shows that there is no fault injected to the pipeline. It turns out there is
a bug in the test code. Counting down 3 means inject a fault when receiving
the fourth packet. But the code allows the file to have only 3 packets.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.


[ 
https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794189#action_12794189
 ] 

Hudson commented on HDFS-814:
-

Integrated in Hadoop-Hdfs-trunk-Commit #155 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/155/])
. Add an api to get the visible length of a DFSDataInputStream.


 Add an api to get the visible length of a DFSDataInputStream.
 -

 Key: HDFS-814
 URL: https://issues.apache.org/jira/browse/HDFS-814
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.21.0, 0.22.0

 Attachments: h814_20091221.patch, h814_20091221_0.21.patch


 Hflush guarantees that the bytes written before are visible to the new 
 readers.  However, there is no way to get the length of the visible bytes.  
 The visible length is useful in some applications like SequenceFile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-116) NPE if the system can't determine its own name and you go DNS.getDefaultHost(null)

2009-12-23 Thread Eli Collins (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794194#action_12794194
 ] 

Eli Collins commented on HDFS-116:
--

+1 That sounds great.

 NPE if the system can't determine its own name and you go 
 DNS.getDefaultHost(null)
 --

 Key: HDFS-116
 URL: https://issues.apache.org/jira/browse/HDFS-116
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor

 In a test case that I am newly writing, on my infamous home machine with 
 broken DNS, I cant call getByName(null) without seeing a stack trace:
 Testcase: testNullInterface took 0.014 sec
   Caused an ERROR
 null
 java.lang.NullPointerException
   at java.net.NetworkInterface.getByName(NetworkInterface.java:226)
   at org.apache.hadoop.net.DNS.getIPs(DNS.java:94)
   at org.apache.hadoop.net.DNS.getHosts(DNS.java:141)
   at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:218)
   at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:235)
   at org.apache.hadoop.net.TestDNS.testNullInterface(TestDNS.java:62)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-42) NetUtils.createSocketAddr NPEs if dfs.datanode.ipc.address is not set for a data node


 [ 
https://issues.apache.org/jira/browse/HDFS-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-42:
---

Affects Version/s: 0.22.0
   0.21.0
   0.20.2
   0.20.1

 NetUtils.createSocketAddr NPEs if dfs.datanode.ipc.address is not set for a 
 data node
 -

 Key: HDFS-42
 URL: https://issues.apache.org/jira/browse/HDFS-42
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor

 DataNode.startDatanode assumes that a configuration always returns a non-null 
 dfs.datanode.ipc.address value, as the result is passed straight down to 
 NetUtils.createSocketAddr
 InetSocketAddress ipcAddr = NetUtils.createSocketAddr(
 conf.get(dfs.datanode.ipc.address));
 which triggers an NPE
 Caused by: java.lang.NullPointerException
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130)
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:119)
 at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:353)
 at org.apache.hadoop.dfs.DataNode.(DataNode.java:185)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Moved: (HDFS-851) NPE in FSDir.getBlockInfo


 [ 
https://issues.apache.org/jira/browse/HDFS-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HADOOP-4128 to HDFS-851:
-

  Component/s: (was: fs)
   data-node
Affects Version/s: (was: 0.19.0)
   0.20.1
  Key: HDFS-851  (was: HADOOP-4128)
  Project: Hadoop HDFS  (was: Hadoop Common)

 NPE in FSDir.getBlockInfo
 -

 Key: HDFS-851
 URL: https://issues.apache.org/jira/browse/HDFS-851
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.1
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: hadoop-4128.patch


 This could well be something I've introduced on my variant of the code, 
 although its a recent arrival in my own tests: an NPE while bringing up a 
 datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-775) FSDataset calls getCapacity() twice -bug?


[ 
https://issues.apache.org/jira/browse/HDFS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794220#action_12794220
 ] 

Steve Loughran commented on HDFS-775:
-

committed.

 FSDataset calls getCapacity() twice -bug?
 -

 Key: HDFS-775
 URL: https://issues.apache.org/jira/browse/HDFS-775
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.22.0
Reporter: Steve Loughran
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-775-1.patch, HDFS-775-2.patch


 I'm not sure this is a bug or as intended, but I thought I'd mention it.
 FSDataset.getCapacity() calls DF.getCapacity() twice, when evaluating its 
 capacity. Although there is caching to stop the shell being exec'd twice in a 
 row, there is a risk that the first call doesn't run the shell, and the 
 second does -so the value changes during the method. 
 If that is not intended, it is better to cache the first value for the whole 
 method

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Work started: (HDFS-301) Provide better error messages when fs.default.name is invalid


 [ 
https://issues.apache.org/jira/browse/HDFS-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-301 started by Steve Loughran.

 Provide better error messages when fs.default.name is invalid
 -

 Key: HDFS-301
 URL: https://issues.apache.org/jira/browse/HDFS-301
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: HADOOP-5095-1.patch


 this the followon to HADOOP-5687 - its not enough to detect bad uris, we need 
 good error messages and a set of tests to make sure everything works as 
 intended.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HDFS-849) TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang resolved HDFS-849.


Resolution: Fixed

I've committed this!

 TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails
 --

 Key: HDFS-849
 URL: https://issues.apache.org/jira/browse/HDFS-849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.20.1
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.21.0, 0.22.0

 Attachments: countDown.patch


 .TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails with the 
 following error:
 junit.framework.AssertionFailedError: 
   at 
 org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.runTest17_19(TestFiDataTransferProtocol2.java:139)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestFiDataTransferProtocol2.pipeline_Fi_18(TestFiDataTransferProtocol2.java:186)
 Which means that the test did not trigger pipeline recovery. The test log 
 shows that there is no fault injected to the pipeline. It turns out there is 
 a bug in the test code. Counting down 3 means inject a fault when receiving 
 the fourth packet. But the code allows the file to have only 3 packets. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-775) FSDataset calls getCapacity() twice -bug?


 [ 
https://issues.apache.org/jira/browse/HDFS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-775:


Resolution: Fixed
  Assignee: Steve Loughran
Status: Resolved  (was: Patch Available)

 FSDataset calls getCapacity() twice -bug?
 -

 Key: HDFS-775
 URL: https://issues.apache.org/jira/browse/HDFS-775
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.22.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-775-1.patch, HDFS-775-2.patch


 I'm not sure this is a bug or as intended, but I thought I'd mention it.
 FSDataset.getCapacity() calls DF.getCapacity() twice, when evaluating its 
 capacity. Although there is caching to stop the shell being exec'd twice in a 
 row, there is a risk that the first call doesn't run the shell, and the 
 second does -so the value changes during the method. 
 If that is not intended, it is better to cache the first value for the whole 
 method

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-12-23 Thread Alex Newman (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794224#action_12794224
 ] 

Alex Newman commented on HDFS-200:
--

Anything going on here? Can we expect this to be fixed by 0.21.0?

 In HDFS, sync() not yet guarantees data available to the new readers
 

 Key: HDFS-200
 URL: https://issues.apache.org/jira/browse/HDFS-200
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Tsz Wo (Nicholas), SZE
Assignee: dhruba borthakur
Priority: Blocker
 Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
 fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
 fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
 fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, 
 fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, 
 fsyncConcurrentReaders9.patch, 
 hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
 hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
 namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, 
 ReopenProblem.java, Writer.java, Writer.java


 In the append design doc 
 (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
 says
 * A reader is guaranteed to be able to read data that was 'flushed' before 
 the reader opened the file
 However, this feature is not yet implemented.  Note that the operation 
 'flushed' is now called sync.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-301) Provide better error messages when fs.default.name is invalid


[ 
https://issues.apache.org/jira/browse/HDFS-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794227#action_12794227
 ] 

Steve Loughran commented on HDFS-301:
-

This is going to be a fun patch to nurture through

* Some of the error handling is in 
src/core/org/apache/hadoop/fs/FileSystem.java, in -common
* More error handling is in the Namenode, patches and testing go into -hdfs

I propose splitting the two
# -common code can have its own test, go into common
# Namenode patches can go in, test code moved to Junit4

The the second patch depends on the first, or at least its tests do; there's no 
compile-time dependencies

 Provide better error messages when fs.default.name is invalid
 -

 Key: HDFS-301
 URL: https://issues.apache.org/jira/browse/HDFS-301
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: HADOOP-5095-1.patch


 this the followon to HADOOP-5687 - its not enough to detect bad uris, we need 
 good error messages and a set of tests to make sure everything works as 
 intended.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HDFS-599) Improve Namenode robustness by prioritizing datanode heartbeats over client requests

2009-12-23 Thread Dmytro Molkov (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dmytro Molkov reassigned HDFS-599:
--

Assignee: Dmytro Molkov (was: dhruba borthakur)

Improve Namenode robustness by prioritizing datanode heartbeats over client
requests

Key: HDFS-599
URL: https://issues.apache.org/jira/browse/HDFS-599
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Reporter: dhruba borthakur
Assignee: Dmytro Molkov

The namenode processes RPC requests from clients that are reading/writing to
files as well as heartbeats/block reports from datanodes.
Sometime, because of various reasons (Java GC runs, inconsistent performance
of NFS filer that stores HDFS transacttion logs, etc), the namenode
encounters transient slowness. For example, if the device that stores the
HDFS transaction logs becomes sluggish, the Namenode's ability to process
RPCs slows down to a certain extent. During this time, the RPCs from clients
as well as the RPCs from datanodes suffer in similar fashion. If the
underlying problem becomes worse, the NN's ability to process a heartbeat
from a DN is severly impacted, thus causing the NN to declare that the DN is
dead. Then the NN starts replicating blocks that used to reside on the
now-declared-dead datanode. This adds extra load to the NN. Then the
now-declared-datanode finally re-establishes contact with the NN, and sends a
block report. The block report processing on the NN is another heavyweight
activity, thus casing more load to the already overloaded namenode.
My proposal is tha the NN should try its best to continue processing RPCs
from datanodes and give lesser priority to serving client requests. The
Datanode RPCs are integral to the consistency and performance of the Hadoop
file system, and it is better to protect it at all costs. This will ensure
that NN recovers from the hiccup much faster than what it does now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-849) TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails

[
https://issues.apache.org/jira/browse/HDFS-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794235#action_12794235
]

Hudson commented on HDFS-849:
-

Integrated in Hadoop-Hdfs-trunk-Commit #156 (See
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/156/])
. TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails. Contributed
by Hairong Kuang.

TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails
--

Attachments: countDown.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-775) FSDataset calls getCapacity() twice -bug?


[ 
https://issues.apache.org/jira/browse/HDFS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794236#action_12794236
 ] 

Hudson commented on HDFS-775:
-

Integrated in Hadoop-Hdfs-trunk-Commit #156 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/156/])
 FSDataset calls getCapacity() twice


 FSDataset calls getCapacity() twice -bug?
 -

 Key: HDFS-775
 URL: https://issues.apache.org/jira/browse/HDFS-775
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.22.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-775-1.patch, HDFS-775-2.patch


 I'm not sure this is a bug or as intended, but I thought I'd mention it.
 FSDataset.getCapacity() calls DF.getCapacity() twice, when evaluating its 
 capacity. Although there is caching to stop the shell being exec'd twice in a 
 row, there is a risk that the first call doesn't run the shell, and the 
 second does -so the value changes during the method. 
 If that is not intended, it is better to cache the first value for the whole 
 method

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-775) FSDataset calls getCapacity() twice -bug?


[ 
https://issues.apache.org/jira/browse/HDFS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794248#action_12794248
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-775:
-

Hi Steve, forgot to update CHANGE.txt?

 FSDataset calls getCapacity() twice -bug?
 -

 Key: HDFS-775
 URL: https://issues.apache.org/jira/browse/HDFS-775
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.22.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-775-1.patch, HDFS-775-2.patch


 I'm not sure this is a bug or as intended, but I thought I'd mention it.
 FSDataset.getCapacity() calls DF.getCapacity() twice, when evaluating its 
 capacity. Although there is caching to stop the shell being exec'd twice in a 
 row, there is a risk that the first call doesn't run the shell, and the 
 second does -so the value changes during the method. 
 If that is not intended, it is better to cache the first value for the whole 
 method

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-762) Trying to start the balancer throws a NPE


[ 
https://issues.apache.org/jira/browse/HDFS-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794261#action_12794261
 ] 

dhruba borthakur commented on HDFS-762:
---

The two failed tests are
 org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery.testGetNewStamp 
 org.apache.hadoop.hdfs.server.datanode.TestBlockReport.blockReport

and the only change that this patch is to change the constructor of the 
Balancer. In no way is this test failure related to this patch. I a going to 
commit this patch.

 Trying to start the balancer throws a NPE
 -

 Key: HDFS-762
 URL: https://issues.apache.org/jira/browse/HDFS-762
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Cristian Ivascu
Assignee: Cristian Ivascu
 Fix For: 0.21.0

 Attachments: 0001-corrected-balancer-constructor.patch, HDFS-762.patch


 When trying to run the balancer, I get a NullPointerException:
 2009-11-10 11:08:14,235 ERROR 
 org.apache.hadoop.hdfs.server.balancer.Balancer: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:161)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.checkReplicationPolicyCompatibility(Balancer.java:784)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.init(Balancer.java:792)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:814)
 This happens when trying to use bin/start-balancer or bin/hdfs balancer 
 -threshold 10
 The config files (hdfs-site and core-site) have as fs.default.name 
 hdfs://namenode:9000.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-762) Trying to start the balancer throws a NPE


 [ 
https://issues.apache.org/jira/browse/HDFS-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-762:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Cristian.

 Trying to start the balancer throws a NPE
 -

 Key: HDFS-762
 URL: https://issues.apache.org/jira/browse/HDFS-762
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Cristian Ivascu
Assignee: Cristian Ivascu
 Fix For: 0.21.0

 Attachments: 0001-corrected-balancer-constructor.patch, HDFS-762.patch


 When trying to run the balancer, I get a NullPointerException:
 2009-11-10 11:08:14,235 ERROR 
 org.apache.hadoop.hdfs.server.balancer.Balancer: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:161)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.checkReplicationPolicyCompatibility(Balancer.java:784)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.init(Balancer.java:792)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:814)
 This happens when trying to use bin/start-balancer or bin/hdfs balancer 
 -threshold 10
 The config files (hdfs-site and core-site) have as fs.default.name 
 hdfs://namenode:9000.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-94) The Heap Size in HDFS web ui may not be accurate


 [ 
https://issues.apache.org/jira/browse/HDFS-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-94:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Dmytro.



 The Heap Size in HDFS web ui may not be accurate
 --

 Key: HDFS-94
 URL: https://issues.apache.org/jira/browse/HDFS-94
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Dmytro Molkov
 Fix For: 0.22.0

 Attachments: HDFS-94.patch


 It seems that the Heap Size shown in HDFS web UI is not accurate.  It keeps 
 showing 100% of usage.  e.g.
 {noformat}
 Heap Size is 10.01 GB / 10.01 GB (100%) 
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-767) Job failure due to BlockMissingException


 [ 
https://issues.apache.org/jira/browse/HDFS-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-767:
--

Status: Open  (was: Patch Available)

Hi Ning, can we get a few more minor issues fixed:

{quote}
   - * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
+ * Licensed the Apache Software Foundation (ASF) under one
+ * or more contributor license See.
+ * agreements the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
{quote}

The above change change be reverted.

{quote}
   prefetchSize = 
conf.getLong(DFSConfigKeys.DFS_CLIENT_READ_PREFETCH_SIZE_KEY, prefetchSize);
+  timeWindow = 
conf.getInt(dfs.client.baseTimeWindow.waitOn.BlockMissingException, 3000);
{quote}

can we add the new configuration value to DFSConfigKeys?

{quote}
+// See JIRA HDFS-767 for more details.
{quote}
Remove the above comment because this is already captured in the svn revision 
history

Thanks a bunch


 Job failure due to BlockMissingException
 

 Key: HDFS-767
 URL: https://issues.apache.org/jira/browse/HDFS-767
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HDFS-767.patch, HDFS-767_2.patch


 If a block is request by too many mappers/reducers (say, 3000) at the same 
 time, a BlockMissingException is thrown because it exceeds the upper limit (I 
 think 256 by default) of number of threads accessing the same block at the 
 same time. The DFSClient wil catch that exception and retry 3 times after 
 waiting for 3 seconds. Since the wait time is a fixed value, a lot of clients 
 will retry at about the same time and a large portion of them get another 
 failure. After 3 retries, there are about 256*4 = 1024 clients got the block. 
 If the number of clients are more than that, the job will fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-445) pread() fails when cached block locations are no longer valid

2009-12-23 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-445:
--

Attachment: HDFS-445-0_20.2.patch

Patch for Hadoop 20 added.

 pread() fails when cached block locations are no longer valid
 -

 Key: HDFS-445
 URL: https://issues.apache.org/jira/browse/HDFS-445
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kan Zhang
Assignee: Kan Zhang
 Fix For: 0.21.0

 Attachments: 445-06.patch, 445-08.patch, HDFS-445-0_20.2.patch


 when cached block locations are no longer valid (e.g., datanodes restart on 
 different ports), pread() will fail, whereas normal read() still succeeds 
 through re-fetching of block locations from namenode (up to a max number of 
 times). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-767) Job failure due to BlockMissingException

2009-12-23 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HDFS-767:


Status: Patch Available  (was: In Progress)

 Job failure due to BlockMissingException
 

 Key: HDFS-767
 URL: https://issues.apache.org/jira/browse/HDFS-767
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HDFS-767.patch, HDFS-767_2.patch, HDFS-767_3.patch


 If a block is request by too many mappers/reducers (say, 3000) at the same 
 time, a BlockMissingException is thrown because it exceeds the upper limit (I 
 think 256 by default) of number of threads accessing the same block at the 
 same time. The DFSClient wil catch that exception and retry 3 times after 
 waiting for 3 seconds. Since the wait time is a fixed value, a lot of clients 
 will retry at about the same time and a large portion of them get another 
 failure. After 3 retries, there are about 256*4 = 1024 clients got the block. 
 If the number of clients are more than that, the job will fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-767) Job failure due to BlockMissingException

2009-12-23 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HDFS-767:


Attachment: HDFS-767_3.patch

HDFS-767_3.patch contains all the changes suggested by Dhruba. 

 Job failure due to BlockMissingException
 

 Key: HDFS-767
 URL: https://issues.apache.org/jira/browse/HDFS-767
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HDFS-767.patch, HDFS-767_2.patch, HDFS-767_3.patch


 If a block is request by too many mappers/reducers (say, 3000) at the same 
 time, a BlockMissingException is thrown because it exceeds the upper limit (I 
 think 256 by default) of number of threads accessing the same block at the 
 same time. The DFSClient wil catch that exception and retry 3 times after 
 waiting for 3 seconds. Since the wait time is a fixed value, a lot of clients 
 will retry at about the same time and a large portion of them get another 
 failure. After 3 retries, there are about 256*4 = 1024 clients got the block. 
 If the number of clients are more than that, the job will fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-767) Job failure due to BlockMissingException

2009-12-23 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794345#action_12794345
]

Hadoop QA commented on HDFS-767:

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12428890/HDFS-767_3.patch
against trunk revision 893650.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/console

This message is automatically generated.

Job failure due to BlockMissingException

Key: HDFS-767
URL: https://issues.apache.org/jira/browse/HDFS-767
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
Attachments: HDFS-767.patch, HDFS-767_2.patch, HDFS-767_3.patch

If a block is request by too many mappers/reducers (say, 3000) at the same
time, a BlockMissingException is thrown because it exceeds the upper limit (I
think 256 by default) of number of threads accessing the same block at the
same time. The DFSClient wil catch that exception and retry 3 times after
waiting for 3 seconds. Since the wait time is a fixed value, a lot of clients
will retry at about the same time and a large portion of them get another
failure. After 3 retries, there are about 256*4 = 1024 clients got the block.
If the number of clients are more than that, the job will fail.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-94) The Heap Size in HDFS web ui may not be accurate


[ 
https://issues.apache.org/jira/browse/HDFS-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794347#action_12794347
 ] 

Hudson commented on HDFS-94:


Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #159 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/])
. The Heap Size printed in the NameNode WebUI is accurate.
(Dmytro Molkov via dhruba)


 The Heap Size in HDFS web ui may not be accurate
 --

 Key: HDFS-94
 URL: https://issues.apache.org/jira/browse/HDFS-94
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Dmytro Molkov
 Fix For: 0.22.0

 Attachments: HDFS-94.patch


 It seems that the Heap Size shown in HDFS web UI is not accurate.  It keeps 
 showing 100% of usage.  e.g.
 {noformat}
 Heap Size is 10.01 GB / 10.01 GB (100%) 
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-564) Adding pipeline test 17-35


[ 
https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794349#action_12794349
 ] 

Hudson commented on HDFS-564:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #159 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/])


 Adding pipeline test 17-35
 --

 Key: HDFS-564
 URL: https://issues.apache.org/jira/browse/HDFS-564
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.21.0
Reporter: Kan Zhang
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.21.0, 0.22.0

 Attachments: h564-24.patch, h564-25.patch, pipelineTests.patch, 
 pipelineTests1.patch, pipelineTests2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-762) Trying to start the balancer throws a NPE


[ 
https://issues.apache.org/jira/browse/HDFS-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794346#action_12794346
 ] 

Hudson commented on HDFS-762:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #159 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/])
. Balancer causes Null Pointer Exception.
(Cristian Ivascu via dhruba)
. Balancer causes Null Pointer Exception. 
(Cristian Ivascu via dhruba)


 Trying to start the balancer throws a NPE
 -

 Key: HDFS-762
 URL: https://issues.apache.org/jira/browse/HDFS-762
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Cristian Ivascu
Assignee: Cristian Ivascu
 Fix For: 0.21.0

 Attachments: 0001-corrected-balancer-constructor.patch, HDFS-762.patch


 When trying to run the balancer, I get a NullPointerException:
 2009-11-10 11:08:14,235 ERROR 
 org.apache.hadoop.hdfs.server.balancer.Balancer: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:161)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.checkReplicationPolicyCompatibility(Balancer.java:784)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.init(Balancer.java:792)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:814)
 This happens when trying to use bin/start-balancer or bin/hdfs balancer 
 -threshold 10
 The config files (hdfs-site and core-site) have as fs.default.name 
 hdfs://namenode:9000.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-849) TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails

[
https://issues.apache.org/jira/browse/HDFS-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794350#action_12794350
]

Hudson commented on HDFS-849:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #159 (See
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/])
. TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails. Contributed
by Hairong Kuang.

TestFiDataTransferProtocol2#pipeline_Fi_18 sometimes fails
--

Attachments: countDown.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure

[
https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794348#action_12794348
]

Hudson commented on HDFS-101:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #159 (See
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/])

DFS write pipeline : DFSClient sometimes does not detect second datanode
failure
-

Key: HDFS-101
URL: https://issues.apache.org/jira/browse/HDFS-101
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Raghu Angadi
Assignee: Hairong Kuang
Priority: Blocker
Fix For: 0.20.2, 0.21.0, 0.22.0

Attachments: detectDownDN-0.20.patch, detectDownDN1-0.20.patch,
detectDownDN2.patch, detectDownDN3-0.20.patch, detectDownDN3.patch,
hdfs-101.tar.gz

When the first datanode's write to second datanode fails or times out
DFSClient ends up marking first datanode as the bad one and removes it from
the pipeline. Similar problem exists on DataNode as well and it is fixed in
HADOOP-3339. From HADOOP-3339 :
The main issue is that BlockReceiver thread (and DataStreamer in the case of
DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty
coarse control. We don't know what state the responder is in and interrupting
has different effects depending on responder state. To fix this properly we
need to redesign how we handle these interactions.
When the first datanode closes its socket from DFSClient, DFSClient should
properly read all the data left in the socket.. Also, DataNode's closing of
the socket should not result in a TCP reset, otherwise I think DFSClient will
not be able to read from the socket.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.


[ 
https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794351#action_12794351
 ] 

Hudson commented on HDFS-814:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #159 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/159/])
. Add an api to get the visible length of a DFSDataInputStream.


 Add an api to get the visible length of a DFSDataInputStream.
 -

 Key: HDFS-814
 URL: https://issues.apache.org/jira/browse/HDFS-814
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.21.0, 0.22.0

 Attachments: h814_20091221.patch, h814_20091221_0.21.patch


 Hflush guarantees that the bytes written before are visible to the new 
 readers.  However, there is no way to get the length of the visible bytes.  
 The visible length is useful in some applications like SequenceFile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-775) FSDataset calls getCapacity() twice -bug?