from:"ryan rawson \(JIRA\)"

[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-13 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994199#comment-12994199
]

ryan rawson commented on HDFS-347:
--

In a test with 15 threads of HBase clients, latency goes from 12.1 ms - 6.9 ms
with this patch. Based on my report to user@hbase list, there are a few people
who are pulling down my patched hadoop variant and want to test and run with
it. Based on the iceberg theory of interest, this is one of the hottest things
I've seen people want, and want NOW in a while.

DFS read performance suboptimal when client co-located on nodes with data
-

Key: HDFS-347
URL: https://issues.apache.org/jira/browse/HDFS-347
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: George Porter
Assignee: Todd Lipcon
Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch,
HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-branch-20-append.txt,
all.tsv, hdfs-347.png, hdfs-347.txt, local-reads-doc

One of the major strategies Hadoop uses to get scalable data processing is to
move the code to the data. However, putting the DFS client on the same
physical node as the data blocks it acts on doesn't improve read performance
as much as expected.
After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem
is due to the HDFS streaming protocol causing many more read I/O operations
(iops) than necessary. Consider the case of a DFSClient fetching a 64 MB
disk block from the DataNode process (running in a separate JVM) running on
the same machine. The DataNode will satisfy the single disk block request by
sending data back to the HDFS client in 64-KB chunks. In BlockSender.java,
this is done in the sendChunk() method, relying on Java's transferTo()
method. Depending on the host O/S and JVM implementation, transferTo() is
implemented as either a sendfilev() syscall or a pair of mmap() and write().
In either case, each chunk is read from the disk by issuing a separate I/O
operation for each chunk. The result is that the single request for a 64-MB
block ends up hitting the disk as over a thousand smaller requests for 64-KB
each.
Since the DFSClient runs in a different JVM and process than the DataNode,
shuttling data from the disk to the DFSClient also results in context
switches each time network packets get sent (in this case, the 64-kb chunk
turns into a large number of 1500 byte packet send operations). Thus we see
a large number of context switches for each block send operation.
I'd like to get some feedback on the best way to address this, but I think
providing a mechanism for a DFSClient to directly open data blocks that
happen to be on the same machine. It could do this by examining the set of
LocatedBlocks returned by the NameNode, marking those that should be resident
on the local host. Since the DataNode and DFSClient (probably) share the
same hadoop configuration, the DFSClient should be able to find the files
holding the block data, and it could directly open them and send data back to
the client. This would avoid the context switches imposed by the network
layer, and would allow for much larger read buffers than 64KB, which should
reduce the number of iops imposed by each read block operation.

--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-09 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992766#comment-12992766
]

ryan rawson commented on HDFS-347:
--

dhruba, I am not seeing the file
src/hdfs/org/apache/hadoop/hdfs/metrics/DFSClientMetrics.java in
branch-20-append (nor cdh3b2). I also got a number of rejects, here are some
highlights:

ClientDatanodeProtocol, your variant has copyBlock, ours does not (hence the
rej).
Misc field differences in DFSClient, including the metrics object

After resolving them I was able to get it up and going.

I'm not able to get the unit test to pass, I'm guessing it's this:
2011-02-09 14:35:49,926 DEBUG hdfs.DFSClient
(DFSClient.java:fetchBlockByteRange(1927)) - fetchBlockByteRange
shortCircuitLocalReads true localhst h132.sfo.stumble.net/10.10.1.132
targetAddr /127.0.0.1:62665

Since we don't recognize that we are 'local', we do the normal read path which
is failing. Any tips?

DFS read performance suboptimal when client co-located on nodes with data
-

Key: HDFS-347
URL: https://issues.apache.org/jira/browse/HDFS-347
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: George Porter
Assignee: Todd Lipcon
Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch,
HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png,
hdfs-347.txt, local-reads-doc

--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-09 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992786#comment-12992786
]

ryan rawson commented on HDFS-347:
--

Applying this patch to branch-20-append and the unit test passes. Still trying
to figure out why it works on one thing and not on the other. The patch is
pretty dang simple too.

DFS read performance suboptimal when client co-located on nodes with data
-

Key: HDFS-347
URL: https://issues.apache.org/jira/browse/HDFS-347
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: George Porter
Assignee: Todd Lipcon
Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch,
HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png,
hdfs-347.txt, local-reads-doc

--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-09 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992801#comment-12992801
]

ryan rawson commented on HDFS-347:
--

ok this was my bad, i applied the patch wrong. unit test passes. I'll attach a
patch for others

DFS read performance suboptimal when client co-located on nodes with data
-

Key: HDFS-347
URL: https://issues.apache.org/jira/browse/HDFS-347
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: George Porter
Assignee: Todd Lipcon
Attachments: BlockReaderLocal1.txt, HADOOP-4801.1.patch,
HADOOP-4801.2.patch, HADOOP-4801.3.patch, all.tsv, hdfs-347.png,
hdfs-347.txt, local-reads-doc

--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2011-02-09 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ryan rawson updated HDFS-347:
-

Attachment: HDFS-347-branch-20-append.txt

applies to head of branch-20-append

DFS read performance suboptimal when client co-located on nodes with data
-

--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes

2010-08-20 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900605#action_12900605
 ] 

ryan rawson commented on HDFS-1052:
---

Have you considered merely increasing the heap size? Switch to a no-pause GC 
collector, one of them lies therein: http://www.managedruntime.org/  Right now 
a machine can have 256 GB of ram for ~ $10,000.  That is a 4x increase over 
what we have now.  Added bonus: no additional complexity!

 HDFS scalability with multiple namenodes
 

 Key: HDFS-1052
 URL: https://issues.apache.org/jira/browse/HDFS-1052
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Affects Versions: 0.22.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf


 HDFS currently uses a single namenode that limits scalability of the cluster. 
 This jira proposes an architecture to scale the nameservice horizontally 
 using multiple namenodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1320) Add LOG.isDebugEnabled() guard for each LOG.debug(...)

2010-07-26 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892557#action_12892557
 ] 

ryan rawson commented on HDFS-1320:
---

does the JVM not optimize for this case in the fast-path?

 Add LOG.isDebugEnabled() guard for each LOG.debug(...)
 

 Key: HDFS-1320
 URL: https://issues.apache.org/jira/browse/HDFS-1320
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Erik Steffl
 Fix For: 0.22.0

 Attachments: HDFS-1320-0.22.patch


 Each LOG.debug(...) should be executed only if LOG.isDebugEnabled() is 
 true, in some cases it's expensive to construct the string that is being 
 printed to log. It's much easier to always use LOG.isDebugEnabled() because 
 it's easier to check (rather than in each case reason wheather it's 
 neccessary or not).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes

2010-03-23 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848562#action_12848562
 ] 

ryan rawson commented on HDFS-1052:
---

This sounds great!  Also as part of the architecture can you explain how you 
will improve availability?

 HDFS scalability with multiple namenodes
 

 Key: HDFS-1052
 URL: https://issues.apache.org/jira/browse/HDFS-1052
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Affects Versions: 0.22.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 HDFS currently uses a single namenode that limits scalability of the cluster. 
 This jira proposes an architecture to scale the nameservice horizontally 
 using multiple namenodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1051) Umbrella Jira for Scaling the HDFS Name Service

2010-03-19 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847653#action_12847653
 ] 

ryan rawson commented on HDFS-1051:
---

Perhaps we could think of using Zab from Zookeeper to keep a cluster of 
NewNameNodes in sync all the time? Instead of doing a failure detection and 
failover and recovery, we could keep 2N+1 nodes always up to date.  No recovery 
would be necessary during node failover, rolling restarts and machine moves 
would be doable (just like in ZK).  This could reduce the deployment complexity 
over some of the other options.

For metadata scalability in RAM, a possibility would be to leverage flash disk 
in some capacity.  Swap, mmap, explicit local files in flash would be 
reasonably fast and could extend the serviceable life of the 1 machine holds 
all metadata architecture concept.

 Umbrella Jira for Scaling the HDFS Name Service
 ---

 Key: HDFS-1051
 URL: https://issues.apache.org/jira/browse/HDFS-1051
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.22.0
Reporter: Sanjay Radia
Assignee: Sanjay Radia
 Fix For: 0.22.0


 The HDFS Name service currently uses a single Namenode which limits its 
 scalability. This is a master jira to track sub-jiras to address this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1051) Umbrella Jira for Scaling the HDFS Name Service

2010-03-19 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847655#action_12847655
 ] 

ryan rawson commented on HDFS-1051:
---

One other note - most people have problems with the availability of NameNode, 
not the scalability of the cluster (ie: medium data over big data).  I would 
argue that availability of the namenode is the highest priority and is holding 
HDFS back from widespread adoption.

 Umbrella Jira for Scaling the HDFS Name Service
 ---

 Key: HDFS-1051
 URL: https://issues.apache.org/jira/browse/HDFS-1051
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.22.0
Reporter: Sanjay Radia
Assignee: Sanjay Radia
 Fix For: 0.22.0


 The HDFS Name service currently uses a single Namenode which limits its 
 scalability. This is a master jira to track sub-jiras to address this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

2010-03-17 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846525#action_12846525
 ] 

ryan rawson commented on HDFS-1002:
---

we create the directory first, here is the relevant code:

if (!fs.exists(dir)) {
  fs.mkdirs(dir);
}
Path path = getUniqueFile(fs, dir);
return new HFile.Writer(fs, path, blocksize,
  algorithm == null? HFile.DEFAULT_COMPRESSION_ALGORITHM: algorithm,
  c == null? KeyValue.KEY_COMPARATOR: c);

(StoreFile.java:398 in HBase)


In this case getUniqueFile generates that 9207375265821366914 filename.  
The constructor of HFile.Writer() will create the file using fs.create(path).

Also 'dir' == /hbase/stumbles_by_userid/compaction.dir/378232123 in this 
context. 

So yes we create the directory before creating the file.

 Secondary Name Node crash, NPE in edit log replay
 -

 Key: HDFS-1002
 URL: https://issues.apache.org/jira/browse/HDFS-1002
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: ryan rawson
 Fix For: 0.21.0

 Attachments: snn_crash.tar.gz, snn_log.txt


 An NPE in SNN, the core of the message looks like yay so:
 2010-02-25 11:54:05,834 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219)
 at java.lang.Thread.run(Thread.java:619)
 This happens even if I restart SNN over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-265) Revisit append

2010-03-17 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846704#action_12846704
]

ryan rawson commented on HDFS-265:
--

The head of the mail conversation is here:

http://mail-archives.apache.org/mod_mbox/hadoop-general/201002.mbox/dfe484f01002161412h8bc953axee2a73d81a234...@mail.gmail.com

There was no close to the discussion. At this point there does not seem to be
any plans for Hadoop 0.21.

Revisit append
--

Key: HDFS-265
URL: https://issues.apache.org/jira/browse/HDFS-265
Project: Hadoop HDFS
Issue Type: Improvement
Affects Versions: 0.21.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.21.0

Attachments: a.sh, appendDesign.pdf, appendDesign.pdf,
appendDesign1.pdf, appendDesign2.pdf, AppendSpec.pdf, AppendTestPlan.html,
AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html,
AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html,
AppendTestPlan.html, AppendTestPlan.html, TestPlanAppend.html

HADOOP-1700 and related issues have put a lot of efforts to provide the first
implementation of append. However, append is such a complex feature. It turns
out that there are issues that were initially seemed trivial but needs a
careful design. This jira revisits append, aiming for a design and
implementation supporting a semantics that are acceptable to its users.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-03-11 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844195#action_12844195
 ] 

ryan rawson commented on HDFS-918:
--

this sounds great guys. For HBase, there is more need for the read path, rather 
than the write path.  I would love to test this for HBase, and we could 
postpone/delay the write path code.

 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, 
 hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1028) INode.getPathNames could split more efficiently

2010-03-05 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842215#action_12842215
 ] 

ryan rawson commented on HDFS-1028:
---

+1 - tsuna on our staff found that in a string intensive processing mini-app, 
he was able to speed his app up by a factor of 2 by not using that API.  Also 
PrintWriter is many times faster than PrintStream.

 INode.getPathNames could split more efficiently
 ---

 Key: HDFS-1028
 URL: https://issues.apache.org/jira/browse/HDFS-1028
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Todd Lipcon
Priority: Minor

 INode.getPathnames uses String.split(String) which actually uses the full 
 Java regex implementation. Since we're always splitting on a single char, we 
 could implement a faster one like StringUtils.split() (except without the 
 escape character). This takes a significant amount of CPU during FSImage 
 loading so should be a worthwhile speedup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image

2010-02-25 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838497#action_12838497
 ] 

ryan rawson commented on HDFS-686:
--

Why are you closing this as fixed when it clearly isn't?

 NullPointerException is thrown while merging edit log and image
 ---

 Key: HDFS-686
 URL: https://issues.apache.org/jira/browse/HDFS-686
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.21.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Priority: Blocker
 Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch


 Our secondary name node is not able to start on NullPointerException:
 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
 at
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590)
 at
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 This was caused by setting access time on a non-existent file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

2010-02-25 Thread ryan rawson (JIRA)

Secondary Name Node crash, NPE in edit log replay
-

 Key: HDFS-1002
 URL: https://issues.apache.org/jira/browse/HDFS-1002
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: ryan rawson
 Fix For: 0.21.0


An NPE in SNN, the core of the message looks like yay so:

2010-02-25 11:54:05,834 ERROR 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219)
at java.lang.Thread.run(Thread.java:619)

This happens even if I restart SNN over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

2010-02-25 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838589#action_12838589
 ] 

ryan rawson commented on HDFS-1002:
---

I forgot, this is a hadoop 0.21 build, based on SVN revision 901028, its from 
http://github.com/apache/hadoop-hdfs/commit/de5d50187a34d35e5e1a9ea8bfa1a2fedb9a7df4
 which is actually 
https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0...@901028 

Inspecting the code on my systems also verifies that yes we have this commit:
http://svn.apache.org/viewvc/hadoop/hdfs/branches/branch-0.21/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java?p2=/hadoop/hdfs/branches/branch-0.21/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.javap1=/hadoop/hdfs/branches/branch-0.21/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.javar1=824955r2=824954view=diffpathrev=824955

 Secondary Name Node crash, NPE in edit log replay
 -

 Key: HDFS-1002
 URL: https://issues.apache.org/jira/browse/HDFS-1002
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: ryan rawson
 Fix For: 0.21.0

 Attachments: snn_crash.tar.gz, snn_log.txt


 An NPE in SNN, the core of the message looks like yay so:
 2010-02-25 11:54:05,834 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219)
 at java.lang.Thread.run(Thread.java:619)
 This happens even if I restart SNN over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

2010-02-25 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838626#action_12838626
 ] 

ryan rawson commented on HDFS-1002:
---

The edits log is 17 meg large, it is included in the attachments here.  

 Secondary Name Node crash, NPE in edit log replay
 -

 Key: HDFS-1002
 URL: https://issues.apache.org/jira/browse/HDFS-1002
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: ryan rawson
 Fix For: 0.21.0

 Attachments: snn_crash.tar.gz, snn_log.txt


 An NPE in SNN, the core of the message looks like yay so:
 2010-02-25 11:54:05,834 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219)
 at java.lang.Thread.run(Thread.java:619)
 This happens even if I restart SNN over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

2010-02-25 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838640#action_12838640
 ] 

ryan rawson commented on HDFS-1002:
---

my JVM has this in it's lsof:
java10109 hadoop  memREG8,3832194 205563 
/home/hadoop/hadoop-0.21/hdfs/hadoop-hdfs-0.21.0-SNAPSHOT.jar

and that build is most definitely from 0.21 branch with the aforementioned 
revision.  Unless the gremlins have come out again and changed the bits on me...

 Secondary Name Node crash, NPE in edit log replay
 -

 Key: HDFS-1002
 URL: https://issues.apache.org/jira/browse/HDFS-1002
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: ryan rawson
 Fix For: 0.21.0

 Attachments: snn_crash.tar.gz, snn_log.txt


 An NPE in SNN, the core of the message looks like yay so:
 2010-02-25 11:54:05,834 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1152)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1164)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1067)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:213)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:511)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219)
 at java.lang.Thread.run(Thread.java:619)
 This happens even if I restart SNN over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image

2010-02-17 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835100#action_12835100
 ] 

ryan rawson commented on HDFS-686:
--

Hey guys, I ran into this on branch-0.21, probably because the patch contained 
in the file nullSetTime.patch was never committed!  

Why would this issue be closed?!

 NullPointerException is thrown while merging edit log and image
 ---

 Key: HDFS-686
 URL: https://issues.apache.org/jira/browse/HDFS-686
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.21.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20.2, 0.21.0

 Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch


 Our secondary name node is not able to start on NullPointerException:
 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
 at
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590)
 at
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 This was caused by setting access time on a non-existent file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-686) NullPointerException is thrown while merging edit log and image

2010-02-17 Thread ryan rawson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HDFS-686:
-

 Priority: Blocker  (was: Major)
Affects Version/s: 0.21.0

 NullPointerException is thrown while merging edit log and image
 ---

 Key: HDFS-686
 URL: https://issues.apache.org/jira/browse/HDFS-686
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.21.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20.2, 0.21.0

 Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch


 Our secondary name node is not able to start on NullPointerException:
 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
 at
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590)
 at
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 This was caused by setting access time on a non-existent file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image

2010-02-17 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835101#action_12835101
 ] 

ryan rawson commented on HDFS-686:
--

here is my stack trace:
2010-02-13 21:46:54,748 ERROR 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Throwable Exception 
in doCheckpoint: 
2010-02-13 21:46:54,749 ERROR 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1399)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1387)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:672)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219)
at java.lang.Thread.run(Thread.java:619)

 NullPointerException is thrown while merging edit log and image
 ---

 Key: HDFS-686
 URL: https://issues.apache.org/jira/browse/HDFS-686
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.21.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.20.2, 0.21.0

 Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch


 Our secondary name node is not able to start on NullPointerException:
 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
 at
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590)
 at
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 This was caused by setting access time on a non-existent file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-02-09 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831696#action_12831696
 ] 

ryan rawson commented on HDFS-918:
--

Great work, but Todd is dead on, we need to look at the few hundred but small 
chunk read scenario.  It's currently where we hurt badly, and one of the things 
hopefully this patch can address. I would be happy to test this out when you 
are ready.

 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-02-03 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829293#action_12829293
]

ryan rawson commented on HDFS-918:
--

I have done some thinking about HBase performance in relation to HDFS, and
right now we are currently bottlenecking reads to a single file. By moving to a
re-entrant API (pread) we are looking to unleash the parallelism. This is
important I think, because we want to push as many parallel reads from our
clients down into Datanode then down into the kernel to benefit from the IO
scheduling in the kernel hardware.

This could mean we might expect literally dozens of parallel reads per node on
a busy cluster. Perhaps even hundreds! Per node. To ensure scalbility we'd
probably want to get away from the xciever model, for more than 1 reason... If
I remember correctly, xcivers not only consume threads (hundreds of threads is
OK but non ideal) but it also consumes epolls, and there is just so many epolls
available. So I heartily approve of the direction of this JIRA!

Use single Selector and small thread pool to replace many instances of
BlockSender for reads

Key: HDFS-918
URL: https://issues.apache.org/jira/browse/HDFS-918
Project: Hadoop HDFS
Issue Type: Improvement
Components: data-node
Reporter: Jay Booth
Fix For: 0.22.0

Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch,
hdfs-multiplex.patch

Currently, on read requests, the DataXCeiver server allocates a new thread
per request, which must allocate its own buffers and leads to
higher-than-optimal CPU and memory usage by the sending threads. If we had a
single selector and a small threadpool to multiplex request packets, we could
theoretically achieve higher performance while taking up fewer resources and
leaving more CPU on datanodes available for mapred, hbase or whatever. This
can be done without changing any wire protocols.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-02-03 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829368#action_12829368
]

ryan rawson commented on HDFS-918:
--

the problem was we were using a stateful interface previously because it was
faster in scan tests, so we serialized reads within 1 RS to any given HFile.
With multiple client handler threads asking for different parts of a large
file, we get a serialized behaviour which hurts random get performance.

So we are moving back to pread, which means we will get more parallelism -
depending your table read pattern of course. But I want to get even more
parallelism, by preading multiple hfiles during a scan/get for example. This
will just up the thread pressure on the datanode.

Use single Selector and small thread pool to replace many instances of
BlockSender for reads

Key: HDFS-918
URL: https://issues.apache.org/jira/browse/HDFS-918
Project: Hadoop HDFS
Issue Type: Improvement
Components: data-node
Reporter: Jay Booth
Fix For: 0.22.0

Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch,
hdfs-multiplex.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

2010-01-12 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799464#action_12799464
]

ryan rawson commented on HDFS-826:
--

Why wouldn't DFSClient just handle the write pipeline problems and just move on?

Otherwise it sounds like if the client gets an error it must close the file,
delete the file, then re-try again? This doesn't sound like a robust filesystem
to me...

Allow a mechanism for an application to detect that datanode(s) have died in
the write pipeline

Key: HDFS-826
URL: https://issues.apache.org/jira/browse/HDFS-826
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Attachments: ReplicableHdfs.txt

HDFS does not replicate the last block of the file that is being currently
written to by an application. Every datanode death in the write pipeline
decreases the reliability of the last block of the currently-being-written
block. This situation can be improved if the application can be notified of a
datanode death in the write pipeline. Then, the application can decide what
is the right course of action to be taken on this event.
In our use-case, the application can close the file on the first datanode
death, and start writing to a newly created file. This ensures that the
reliability guarantee of a block is close to 3 at all time.
One idea is to make DFSOutoutStream. write() throw an exception if the number
of datanodes in the write pipeline fall below minimum.replication.factor that
is set on the client (this is backward compatible).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable

2009-10-16 Thread ryan rawson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HDFS-127:
-

Attachment: HDFS-127-v2.patch

here is a fixed version for hdfs-branch-0.21.  

 DFSClient block read failures cause open DFSInputStream to become unusable
 --

 Key: HDFS-127
 URL: https://issues.apache.org/jira/browse/HDFS-127
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Igor Bolotin
 Attachments: 4681.patch, h127_20091016.patch, HDFS-127-v2.patch


 We are using some Lucene indexes directly from HDFS and for quite long time 
 we were using Hadoop version 0.15.3.
 When tried to upgrade to Hadoop 0.19 - index searches started to fail with 
 exceptions like:
 2008-11-13 16:50:20,314 WARN [Listener-4] [] DFSClient : DFS Read: 
 java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 
 file=/usr/collarity/data/urls-new/part-0/20081110-163426/_0.tis
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536)
 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663)
 at java.io.DataInputStream.read(DataInputStream.java:132)
 at 
 org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:174)
 at 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152)
 at 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
 at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
 at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63)
 at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131)
 at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162)
 at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223)
 at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217)
 at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54) 
 ...
 The investigation showed that the root of this issue is that we exceeded # of 
 xcievers in the data nodes and that was fixed by changing configuration 
 settings to 2k.
 However - one thing that bothered me was that even after datanodes recovered 
 from overload and most of client servers had been shut down - we still 
 observed errors in the logs of running servers.
 Further investigation showed that fix for HADOOP-1911 introduced another 
 problem - the DFSInputStream instance might become unusable once number of 
 failures over lifetime of this instance exceeds configured threshold.
 The fix for this specific issue seems to be trivial - just reset failure 
 counter before reading next block (patch will be attached shortly).
 This seems to be also related to HADOOP-3185, but I'm not sure I really 
 understand necessity of keeping track of failed block accesses in the DFS 
 client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable

2009-10-16 Thread ryan rawson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HDFS-127:
-

Attachment: (was: HDFS-127-v2.patch)

 DFSClient block read failures cause open DFSInputStream to become unusable
 --

 Key: HDFS-127
 URL: https://issues.apache.org/jira/browse/HDFS-127
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Igor Bolotin
 Attachments: 4681.patch, h127_20091016.patch


 We are using some Lucene indexes directly from HDFS and for quite long time 
 we were using Hadoop version 0.15.3.
 When tried to upgrade to Hadoop 0.19 - index searches started to fail with 
 exceptions like:
 2008-11-13 16:50:20,314 WARN [Listener-4] [] DFSClient : DFS Read: 
 java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 
 file=/usr/collarity/data/urls-new/part-0/20081110-163426/_0.tis
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536)
 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663)
 at java.io.DataInputStream.read(DataInputStream.java:132)
 at 
 org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:174)
 at 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152)
 at 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
 at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
 at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63)
 at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131)
 at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162)
 at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223)
 at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217)
 at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54) 
 ...
 The investigation showed that the root of this issue is that we exceeded # of 
 xcievers in the data nodes and that was fixed by changing configuration 
 settings to 2k.
 However - one thing that bothered me was that even after datanodes recovered 
 from overload and most of client servers had been shut down - we still 
 observed errors in the logs of running servers.
 Further investigation showed that fix for HADOOP-1911 introduced another 
 problem - the DFSInputStream instance might become unusable once number of 
 failures over lifetime of this instance exceeds configured threshold.
 The fix for this specific issue seems to be trivial - just reset failure 
 counter before reading next block (patch will be attached shortly).
 This seems to be also related to HADOOP-3185, but I'm not sure I really 
 understand necessity of keeping track of failed block accesses in the DFS 
 client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-09-23 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758946#action_12758946
 ] 

ryan rawson commented on HDFS-200:
--

I'm not sure exactly what I'm seeing, here is some flow of events:

namenode says when the regionserver closes a log:

2009-09-23 17:21:05,128 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* 
NameNode.blockReceived: from 10.10.21.38:50010 1 blocks.
2009-09-23 17:21:05,128 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.blockReceived: blk_8594965619504827451_4351 is received from 
10.10.21.38:50010
2009-09-23 17:21:05,128 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 10.10.21.38:50010 is added to 
blk_859496561950482745
1_4351 size 573866
2009-09-23 17:21:05,130 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* 
NameNode.blockReceived: from 10.10.21.45:50010 1 blocks.
2009-09-23 17:21:05,130 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.blockReceived: blk_8594965619504827451_4351 is received from 
10.10.21.45:50010
2009-09-23 17:21:05,130 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 10.10.21.45:50010 is added to 
blk_859496561950482745
1_4351 size 573866
2009-09-23 17:21:05,131 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* 
NameNode.blockReceived: from 10.10.21.32:50010 1 blocks.
2009-09-23 17:21:05,131 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.blockReceived: blk_8594965619504827451_4351 is received from 
10.10.21.32:50010
2009-09-23 17:21:05,131 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 10.10.21.32:50010 is added to 
blk_859496561950482745
1_4351 size 573866
2009-09-23 17:21:05,131 DEBUG org.apache.hadoop.hdfs.StateChange: *DIR* 
NameNode.complete: 
/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for DFSClien
t_-2129099062
2009-09-23 17:21:05,131 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.completeFile: 
/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for DFS
Client_-2129099062
2009-09-23 17:21:05,132 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* 
FSDirectory.closeFile: 
/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 with 1 bl
ocks is persisted to the file system
2009-09-23 17:21:05,132 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.completeFile: 
/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 blockli
st persisted

So at this point we have 3 blocks, they have all checked in, right?

then during logfile recovery we see:

2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: *DIR* 
NameNode.append: file 
/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for DFSCl
ient_-828773542 at 10.10.21.29
2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.startFile: 
src=/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228, holde
r=DFSClient_-828773542, clientMachine=10.10.21.29, replication=512, 
overwrite=false, append=true
2009-09-23 17:21:45,997 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 
567 Total time for transactions(ms): 9Number of transactions ba
tched in Syncs: 54 Number of syncs: 374 SyncTimes(ms): 12023 4148 3690 7663 
2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: 
UnderReplicationBlocks.update blk_8594965619504827451_4351 curReplicas 0 
curExpectedReplicas 3 oldReplicas 0 oldExpectedReplicas  3 curPri  2 oldPri  2
2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.UnderReplicationBlock.update:blk_8594965619504827451_4351 has only 0 
replicas and need 3 replicas so is added to neededReplications at priority 
level 2
2009-09-23 17:21:45,997 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.appendFile: file 
/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for 
DFSClient_-828773542 at 10.10.21.29 block blk_8594965619504827451_4351 block 
size 573866
2009-09-23 17:21:45,997 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop,hadoop
   ip=/10.10.21.29 cmd=append  
src=/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228   
dst=nullperm=null
2009-09-23 17:21:47,265 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.UnderReplicationBlock.remove: Removing block 
blk_8594965619504827451_4351 from priority queue 2
2009-09-23 17:21:56,016 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* 
NameNode.addBlock: file 
/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for 
DFSClient_-828773542
2009-09-23 17:21:56,016 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.getAdditionalBlock: file 
/hbase/.logs/sv4borg32,60020,1253751520085/hlog.dat.1253751663228 for 
DFSClient_-828773542
2009-09-23 17:21:56,016

[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-09-23 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758949#action_12758949
 ] 

ryan rawson commented on HDFS-200:
--

scratch the last, i was having some environment/library version problems.

 In HDFS, sync() not yet guarantees data available to the new readers
 

 Key: HDFS-200
 URL: https://issues.apache.org/jira/browse/HDFS-200
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Tsz Wo (Nicholas), SZE
Assignee: dhruba borthakur
Priority: Blocker
 Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
 fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
 fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
 fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, 
 fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, 
 fsyncConcurrentReaders9.patch, 
 hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
 hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
 namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, 
 ReopenProblem.java, Writer.java, Writer.java


 In the append design doc 
 (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
 says
 * A reader is guaranteed to be able to read data that was 'flushed' before 
 the reader opened the file
 However, this feature is not yet implemented.  Note that the operation 
 'flushed' is now called sync.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable

2009-09-18 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757355#action_12757355
 ] 

ryan rawson commented on HDFS-127:
--

+1 this patch is a _must have_ for anyone running hbase. 

The lack of it in hadoop trunk is forcing us to ship a non-standard hadoop jar 
just for a 2 line fix.

Please commit already!

 DFSClient block read failures cause open DFSInputStream to become unusable
 --

 Key: HDFS-127
 URL: https://issues.apache.org/jira/browse/HDFS-127
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Igor Bolotin
 Attachments: 4681.patch


 We are using some Lucene indexes directly from HDFS and for quite long time 
 we were using Hadoop version 0.15.3.
 When tried to upgrade to Hadoop 0.19 - index searches started to fail with 
 exceptions like:
 2008-11-13 16:50:20,314 WARN [Listener-4] [] DFSClient : DFS Read: 
 java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 
 file=/usr/collarity/data/urls-new/part-0/20081110-163426/_0.tis
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536)
 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663)
 at java.io.DataInputStream.read(DataInputStream.java:132)
 at 
 org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:174)
 at 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152)
 at 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
 at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
 at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63)
 at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131)
 at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162)
 at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223)
 at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217)
 at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54) 
 ...
 The investigation showed that the root of this issue is that we exceeded # of 
 xcievers in the data nodes and that was fixed by changing configuration 
 settings to 2k.
 However - one thing that bothered me was that even after datanodes recovered 
 from overload and most of client servers had been shut down - we still 
 observed errors in the logs of running servers.
 Further investigation showed that fix for HADOOP-1911 introduced another 
 problem - the DFSInputStream instance might become unusable once number of 
 failures over lifetime of this instance exceeds configured threshold.
 The fix for this specific issue seems to be trivial - just reset failure 
 counter before reading next block (patch will be attached shortly).
 This seems to be also related to HADOOP-3185, but I'm not sure I really 
 understand necessity of keeping track of failed block accesses in the DFS 
 client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-265) Revisit append

2009-09-14 Thread ryan rawson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755181#action_12755181
 ] 

ryan rawson commented on HDFS-265:
--

hey guys, it's been 4 months since this issue started moving, can we get some 
source so we can review and possibly test?

Thanks!

 Revisit append
 --

 Key: HDFS-265
 URL: https://issues.apache.org/jira/browse/HDFS-265
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: Append Branch
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: Append Branch

 Attachments: appendDesign.pdf, appendDesign.pdf, appendDesign1.pdf, 
 appendDesign2.pdf, AppendSpec.pdf, AppendTestPlan.html, AppendTestPlan.html, 
 AppendTestPlan.html, AppendTestPlan.html, AppendTestPlan.html, 
 AppendTestPlan.html, AppendTestPlan.html, TestPlanAppend.html


 HADOOP-1700 and related issues have put a lot of efforts to provide the first 
 implementation of append. However, append is such a complex feature. It turns 
 out that there are issues that were initially seemed trivial but needs a 
 careful design. This jira revisits append, aiming for a design and 
 implementation supporting a semantics that are acceptable to its users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2009-08-26 Thread ryan rawson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748236#action_12748236
]

ryan rawson commented on HDFS-200:
--

I've been testing this on my 20 node cluster here. Lease recovery can take a
long time which is a bit of an issue. The sync seems to be pretty good
overall, we are recovering most of the edits up until the last flush, and it's
pretty responsive.

However, I have discovered a new bug, the scenario is like so:
- we roll the logs every 1MB (block size).
- we now have 18 logs to recover. The first 17 were closed properly, only the
last one was in mid-write.
- during log recovery, the hbase master calls fs.append(f); out.close();
- But the master gets stuck at the out.close(); It can't seem to progress.
Investigating the logs, it looks like the namenode 'forgets' about the other 2
replicas for the block (file is 1 block), and thus we are stuck
until another replica comes back.

I've attached logs, hadoop fsck, stack traces from hbase.

In HDFS, sync() not yet guarantees data available to the new readers

Key: HDFS-200
URL: https://issues.apache.org/jira/browse/HDFS-200
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Tsz Wo (Nicholas), SZE
Assignee: dhruba borthakur
Priority: Blocker
Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt,
fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt,
fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt,
fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch,
fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch,
fsyncConcurrentReaders9.patch,
hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz,
hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz,
namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh,
ReopenProblem.java, Writer.java, Writer.java

In the append design doc
(https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it
says
* A reader is guaranteed to be able to read data that was 'flushed' before
the reader opened the file
However, this feature is not yet implemented. Note that the operation
'flushed' is now called sync.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

[jira] Updated: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes

[jira] Commented: (HDFS-1320) Add LOG.isDebugEnabled() guard for each LOG.debug(...)

[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes

[jira] Commented: (HDFS-1051) Umbrella Jira for Scaling the HDFS Name Service

[jira] Commented: (HDFS-1051) Umbrella Jira for Scaling the HDFS Name Service

[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

[jira] Commented: (HDFS-265) Revisit append

[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

[jira] Commented: (HDFS-1028) INode.getPathNames could split more efficiently

[jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image

[jira] Created: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

[jira] Commented: (HDFS-1002) Secondary Name Node crash, NPE in edit log replay

[jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image

[jira] Updated: (HDFS-686) NullPointerException is thrown while merging edit log and image

[jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image

[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

[jira] Commented: (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

[jira] Updated: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable

[jira] Updated: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable

[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

[jira] Commented: (HDFS-127) DFSClient block read failures cause open DFSInputStream to become unusable

[jira] Commented: (HDFS-265) Revisit append

[jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

33 matches

Site Navigation

Mail list logo

Footer information